OOCOHBIT BBSOHS 



BD 209 343 

AOTBOB 
IX TLB 

SPOBS AffEBCI 
*OB DATE 

BOTE' 

BOBS PfilCB 
DBSCBIPTOBS 



IB 810 094 



Baney, Bait: Gelberg, Bendy 

Assessaent in Early Childhood Education. 

Departaent of Education, Bashington, D.c. 

1«8p. 

BF01/PC06 Plae Postage. 

J«fiLf5i 1 f b J! d ^ Eda S ati0n: * Ed ««>tional Assessaent; 
Evaluation flethods; *Beasureaent Techniques; 

PzZlllll* (Indi,ridaals > J *Progra. Evaluation; Jesting 
♦Bleaentary Secondary Education Act litle I 



IDBBTIPIEBS 
ABSTBACT 

•*~.4.i •., Tbe 90al o£ this booklet is to describe some of th* 

'Smt^'mC^ 0f early ^iMhood ins^rSlnts and 

sotenfcLii* f fo f Batioa 00 each, annotations to illustrate how 

K Jr?T ly ■■•ful ins'runents can be initially screened *„5 



* Beptoductions supplied by BOBS are the best t 
. * TOm th « original docuaent. 



********************* 

. . . that can be aade * 

original docuaent. * 



9 

ERIC 



0* 



r* * 




•COTf OP INTEREST NOTICE 






Tht ERIC Feahty hat assigned 




this d«v*tnt for proeetsinf 


KN 




to ^ji 






In our judgtmtnt, this document 




is alto of interest to tht deerinf- 


O 




houeut ootid to tht rifht. Index- 




ing should reflect thtir tpectel 


rvj 




point! Of view. 


o 







PS 



US. DEPARTMENT Of EDUCATION 
NATIONAL INSTITUTE OF EDUCATION 

EDUCATIONAL RESOURCES INFORMATION 

CENTER (ERIC) 
JQ The, document Km been reproduced as 
received from the person or organization 
originating it 
I ) Minor change, have been made to improve 
reproduction quality 



• Pointt of view or opinions stated in this docu- 
ment do not necessanly represent official NIE 
position or policy 



ASSESSMENT 



IN 



0 EARLY CHILDHOOD EDUCATION 



December 1980 



Walt Haney and Wendy Gelberg 
The Huron Institute 
123 Mt, Auburn Street 
Cambridge, Massachusetts 02138 




9 

ERIC 



Z 



ERIC 



FOREWORD 

This booklet has been prepared as part of a project sponsored by the 
United States Education Department (USED) on evaluation in early childhood 
Title I (ECT-I) programs. It is one of a series of resource books developed 
in response to concerns expressed by state and local personnel about early 
childhood Title I programs. The series describes an array of diverse 
evaluation activities and outlines how each of these might contribute to 
"improving local programs. The series revolves around a set of questions: 
e Who will use the evaluation results? 

e What kinds of information are users likely to find most helpful? 

e In what ways might this information aid in program improvement? 

e Are the potential benefits substantial enough to justify the cost 
and effort of evaluation? 

Together, the resource books address a range of issues relevant to the 

evaluation of early childhood programs for educationally disadvantaged 

children. The series comprises the following volumes: 

e Evaluating Title I Early Childhood Programs: An Overview 

e Assessment in Early Childhood Education 

e Short-Term Impact Evaluation of Early Childhood Title I Programs 

e An Introduction to the Value-Added Model and Its Use in Short-Term 
Impact Assessment 

e Evaluation Approaches: A Focus on Improving Early Childhood Title 
I Programs 

e Longitudinal Evaluation Systems for Early Childhood Title I Programs 
e Evaluating Title I Parent Education Program 

The development of this series follows extensive field work on ECT-I 
programs (Yurchak § Bryk, 1979) . In the course of that research, we 



identified a number of concerns that SEA and LEA officials had about ECT-I 
programs, and the kinds of information that might be helpful in addressing 
them. Each resource book in the series thus deals with a specific concern 
or set of concerns. The books and the evaluation approaches they describe 
do not, however, constitute a comprehensive evaluation system to be uniformly 
applied by all. Our feasibility analysis CBryk, Apling, § Mathews, 1978) 
indicated that such a system could not efficiently respond to the specific 
issues of interest in any single district at any given tune. Rather, LEA 
personnel might wish to draw upon one or more of the approaches we describe, 
tailoring their effort to fit the particular problem confronting them. 

Finally, the resource books are not comprehensive technical manuals. 
Their purpose is to help local school personnel identify issues that might 
merit further examination and to guide the choice of suitable evaluation ^ 
strategies to address those issues. Additional information and assistance 
in using the various evaluation strategies are available in the more techni- 
cal publications cited at the end of each volume, and from the Technical 
Assistance Centers in the ten national regions. 



4 



ii 



TABLE OF CONTENTS 

Page 

FOREWORD . . . r — 

: . . . . i 

I. INTRODUCTION 



Motes on Sources of Further Information 
APPENDIX 2 

Listing of Selected Early Childhood Instruments and 
Sources of Review Information on Each 



S 



II. SPECIAL ISSUES IN EARLY CHILDHOOD ASSESSMENT 

Characteristics of Young Children that Make 

Assessment Difficult 
Measurement Considerations 

III. OBSERVATIONAL APPROACHES TO EARLY CHILDHOOD ASSESSMENT .... 19 

The Case for Observational Approaches to 

Early Childhood Assessment 
Potentials and Limitations of Observational 

Approaches 

IV. USES OF EARLY CHILDHOOD ASSESSMENT INSTRUMENTS 31 

Administrative and Public Accountability 
Making Wcisions Concerning Individual Students 
Guidance^ Teachers in the Classroom 
Evaluation 

Using Early Childhood Assessment for Multiple Purposes 

V. SELECTING AND USING EARLY CHILDHOOD ASSESSMENT INSTRUMENTS . . 51 

Screening Potential Instruments 

Trying the Test Out 

Using and Interpreting Tests 

APPENDIX 1 

61 



73 



APPENDIX 3 93 

Annotations on Early Childhood Instruments 
APPENDIX 4 g9 

Reviews of Selected Early Childhood Instruments 
REFERENCES 143 



9 

ERJC 



5 

Hi 



I. INTRODUCTION 



^S^?Ii°i )r ?f ng children «* ^eir environments presents 
X probltas ' • • of the lifted response 

ZSTta l3e'. y0Ung "* tte Very tES occur 

S. Anderson et al., 1972 

As this observation by a panel of experts in child development suggests, 
educational assessment of young children carries certain problems that do not 
necessarily arise U assessing older children. The goal of this booklet is to 
describe some of tne special challenges posed by early childhodd assessment in 
general, and particularly as they apply to Title I program evaluation. The 
bookie v. thus has four purposes: 

e To describe special issues in early childhood assessment ' 

• To describe briefly alternative approaches to early childhood assessment 

' ' pmSv^ t0 Vari0US Ptoses of assessment, 

pw^icuxariy to that of Title I program evaluation 

• &5EJ\z K t ^r or how to seiect - 

Subsequent chapters correspond to these four purposes, and appendices provide: 

# as% t :Lme n nr COmBended £ ° T inforaati ^ on early childhood 

# Uon^fach early Childh °° d instrunwts «» purees of review informa- 

* S^J^"™ ^ instruments can be 

e Descriptive reviews of instruments to illustrate information helpful 
in selecting among candidate instruments. neiptui 

Before going further, let us explain briefly why this booklet was written. 
It has been developed as part of a project, sponsored by the United States 



Education Department, on evaluation of early childhood Title I nrograms. During 
an earlier stage of this project, state and local education personnel concerned 
with Title I expressed a variety of needs for information on early childhood 
testing and assessment (Bryk, Apling, § Mathews, 1978). In particular, these 
educators expressed: 

e Frequent demands for information on technical and procedural problems 
in early childhood testing 

e Concerns about the match between testing and early childhood program 
curricula r 

• Considerable interest in a wide range o* tests and instruments, particu- 
larly ones concerning psychomotor, social, emotional and language 
development 

• Interest in alternative means of assessment, including observation in- 
struments and behavior inventoi^es. 

This booklet is a response to at least some of these needs. Its focus is on 

special issues in the educational assessment of young children. We define 4 ^" 

educational assessment broadly to mean systematic measurement, via testing or 

observation of individual behavior, traits, or other educationally relevant 

characteristics.* A narrower definition might be simply standardized testing. 

However, there are some very good reasons why early childhood assessment should 

not be confined to this form of measurement. We will elaborate on some of 

these reasons in Chapter 2. Here let us point out only that for many goals of 

early childhood education programs, no good paper-and-pencil tests are available. 

Hence, as many experts have pointed out Ce.g. Walker, Bane § Bryk, 1973; Brooks 

§ Weintraub, 1976; Goodwin & Driscoll, .1980), other forms of assessment- 



Some people define educational assessment even more broadly to include 
systematic measurement of characteristics and traits of educational programs 
and environments (e.g. Goodwin 8 Driscoll, 1980). For many purposes (for 
example evaluating program implementation) , such assessment may be essential 
However, in order to limit the scope of this resource book we focus mostly 
on assessment of children. 



A 

-3- 

including systematic observation and rating scales— may be particularly app 
priate for use with young children. For this reason, in Chapter 3, we will 
briefly review soar- of the potential benefits and drawbacks of alternative 
approaches to early childhood assessment. 



8 



-5- 



0 



II. SPECIAL ISSUES IN EARLY CHILDHOOD ASSESSMENT 

What are the special issues in early childhood assessment that can cause 
problems? Why is it »ore difficult to assess young children than plaer chil- 
dren? There are two perspectives from which to answer these questions. The 
first deals with the nature of child development, and the characteristics of 
young children that make assessment difficult. The second treats these issues 
in terms of traditional measurement considerations: validity, reliability, 
and norms. The first two sections below describe these perspectives, and in 
the next chapter we descr.be some of the potentials and problems of observational 
and rating approaches to early childhood assessment. 

CHARACTERISTICS OF YOUNG CHI LDREN THAT MAKE ASSESSMENT DIFFICULT 

As the quotation at the start of this booklet suggests, the assessment 
of young children. is more difficult than that of older children. This is due 
not merely to measurement problems per se, but also to real and important 
features of how >. ung children develop. Before discussing assessment issues 
from the measurement perspective, let us first summarize some of the features 
of child development that have implications for educational assessment. 

One of rhe most obvious problems in the assessment of young children 
is that they cannot read and may lack other test-taking skill/which we assume 
of older children. Thus, tests that require reading of instructions obviously 
cannot be used with young children. As an alternative, many iests for early 
elementary grades rely entirely upon oral instructions from the adult adminis- 
tering the test, and answer alternatives are presented in pictures or drawings. 
Yet even with oral instructions, children's short attention spans-at least 
vith respect to tasks they have not chosen for themselves -may prevent them 



-6- 



from following directions correctly. Comprehending oral instructions, giving 
continued attention to a relevant, item or picture, and marking or otherwise 
indicating a response alternative, all may be difficult tasks for young 
children, and may get in the way of assessing other skills or attributes of 
children. To cite one concrete example, young children may lack the fine 
motor skills necessary for marking some types of machine-scoreable answer 
sheets. For this reason, the use of separate answer sheets is generally not 
appropriate with early elementary children; and with preschool or kindergarten 
children, it may be necessary to use individual assessment procedures in 
which the test-giver marks the child's answer. For many young children this 
may be the only way to avoid confounding real skills of interest with clerical 
skills of testing taking. Also, when pictures or drawings are used in early 
childhood assessment, children may interpret them in unusual ways in light of 
their own experience. 

A second issue which complicates assessment of young children is that 
their cognitive and affective development are not easily disentangled (Bradley 
* Caldwell, 1974). Cognition and affect seem to develop 'together in young 
• children and to interact, making measurement of one dependent upon the other, 
until children are socialized into school and society and affective behavior 
becomes more stable. In other words, how children feel about a task or what 
mood they are in may easily influence their performance. Young children may 
have little interest in externally imposed tasks, and their attention to 
such tasks may easily wander (Pikunas, 1976). They may tire quickly 
(Illingsworth, 1972), and their responses to assessment procedures may be 
influenced by hunger, restlessness, desire to please, or a multitude of other 
motives and circumstances. The reactivity of young children thus makes their 



9 

ERJC 



10 



-7- 



performance in testing and assessment particularly susceptible to extraneous 
influences. Research suggests, for example, that young children's test 
performance is acre apt to be influenced by situational variables-including 
the ethnicity of the test giver-than that of older children CEpps, 1974). 

Children's interpretation of assessment tasks and questions may also 
depend on their level of development. Young children's tendency to view 
things in relation to themselves and their experience' (o< ten called child- 
ren's egocentrism) may prevent them from interpreting a question in the way 
an adult expects. ' tv 0 examples win help to illustrate this point. On a 
.standardized test, when one you,g boy read a short reading passage and then was 
-ked why. i„ a test item, a girl named Susan watches television, he marked the 
response alternative expected from the passage: "Because Susan-likes to watch 
television." y et when the child was asked to explain his' answer, he said: 
"Because I like to watch television. " His answer deriv^not^om the story, 
but from his egocentric perspective of why he wattes tQevisio 3 _^another 
example from a first grade reading test, children^ 

the one picture out of three that goes best with the word next to the pic- 
tures, one item stained the word "fly" with an arrow pointing to pictures 
of an elephant, a bird, and a dog. Instead of marking the intended answer, 
the bird, many first graders had chosen the elephant, or the bird and the 
elephant. Asked to explain their answers, children identified the elephant 
as "Dumbo," the flying elephant (Mehan, 1978, p.Sl). In short, whatever 
their chronological age or grade level, children's interpretations of assess- 
ment tasks may be strongly influenced by many factors, including their per- 
sonal experience and how they feel at the time of assessment. 



erJc 



-8- 



In their early years, children develop rapidly, and while all children 
tend to pass through the sane stages of development, they nay do so at 
different rates. These two aspects of child development greatly complicate 
the use of systematic procedures in early childhood assessment. A procedure 
that is appropriate for ene five-year-old nay work not at allrfor another. 
As one child development expert put it: 

While the developmental rate is high during the preschool 

years, great variability in scores from successive testings 

is not uncommon. An appreciable degree of consistency ^ 

emerges only after about age five when the developmental 

rate has slowed greatly and when going to school brings * 

relatively common program of environmental encounters into 

the lives of children. 

(Hunt, 1961, p. 313) 

MEASUREMENT C0NSI3gtAftt»K 

The various characteristics of young children that make assessment' and 
testing more difficult then that with older children' can also be viewed 
from another perspective: that of the measurement qualities of assessment 
procedures, ^particular, tests and instruments for use with young child- 
ren are generally of lower technical quality than those for use with older * 
children. Measurement experts have made this observation, (e.g. Goodwin « 
Driscoll, 1980; Walker, Bane, 5 Bryk, 1973), and the point has also been shown 
in systematic reviews of tests. When the Center for the Study of Evaluation 
(CSE) of the University of California at Los Angeles reviewed some 800 
published standardized tests for the elementary school level, including 
over 3,900 subtests, they found that only nine of the first .grade level sub- 
tests— or less :han on e percent - received mininally satisfactory ratings. 



At hi her grade levels, both the numbers and proportions of tests CSE rated 
as minimally adequate increased steadily (Hoepfner et al., 1976). At the 
first grade level, only in the domain, labeled "cognitive and intellectual 
skills" was more than a single test rated as minimally adequate.* Such 
evaluations confirm that early childhood tests lack the technical qualities 
of later-grade tests. This general contrast also tends to hold with respect 
to the specific measurement qualities of validity, reliability, and norms. 



Validity 



The most important aspect of assessment quality is validity- that is, 
whether an assessment instrument really does measure what it purports to 
-easure. Though people often speak of validity as if it were a characteris- 
tic of a t«t or assessment instrument, this is not really appropriate. 
Strictly speaking, the validity of an assessment procedure resides not in 
the instrument itself, but in it* use in a particular way with a particular 
population. As Cronbach observed, "one validates not a test, but an inter- 
pretation of data arising from a specified procedure" (Cronbach, 1971, p.447). 
Exactly how to conduct such validation xs still a point of considerable 
debate among measurement experts, but three types of validity criteria are 
widely recognized: 

• Evidence of content validity is required when the test user wishes 

tb/£iTi! 'Si "5 j ndividual P erforms in ^e universe of situations 
tn* test is intended to represent. 



The CSF also rated quality of prekindergarten and kinderw-tan t t« 
an earlier study (Hoepfner et al., 1971). However, sin2 the minJ 

ZfEJEmwfSi 1 ? dtffnm l in * ia StUdy than * S! on Tcilel above 
^elementary evel tests, the results cannot be directly compared. See 
naney et al., 1978, pp. 111-119, for details of ratings across erade levels 

«or^S U n ment - d0mainS WeH " f0r a rtview of crttlSS o'f the dl 
approach to rating test quality. 



-lu- 



9 

ERIC 



• Construct validity is implied when one evaluates a test or other 
set of operations in light of a specified construct-that is, an 
idea developed or "constructed - as a work of informed, scientific 
imagination, such as "intelligence," "readiness," or "social 
competence." In other words, a construct is a theoretical idea 
developed to explain and organize some aspect of existing knowledge. 

* Criterion-related validity applies when one wishes to infer from a 
test score an individual'- most probable standing on some other 
variable called a criterion. There are two forms of criterion- 
related validity. Predictive validity refers to inferences regard- 
ing future performance, while concurrent validity refers to infer- 
ences concerning performance observed or measured at approximately 
the same time as testing or assessment takes place. 

(APA, AERA ft NCM5, 1974) 

Although test publishers tend to emphasize content validity in docu- 
menting the quality of their instruments, some experts have argued strongly 
for the importance of construct validity instead of content or criterion- 
related validity (Messick, l 9 7S) . Also, others have recently argued 
that for educational purposes, tests should have curriculum and instructional 
validity, i.e., they should be related to the content of curriculum and 
instruction. Since such arguments cannot be resolved in the abstract, let 
us simply discuss some of the general validity considerations in early child- 
hood assessment. 

Two of the most common types of early childhood instruments are intel- 
ligence tests and readiness tests. Indeed, intelligence and readiness are 
two of the most familiar constructs in early childhood assessment. Yet in 
practice there is much confusion about what it is that each of these terms 
or constructs actual \y opasses. For example, in their test evaluation 
project, CSE investigators actually classified subtests of some intelligence 
tests as measuring "readiness skills" (Haney e t al., 1978, p. 120). In a 
recent federal court case in California, lengthy expert testimony revealed 
the widely conflicting opinion and confusion that exists in the field of 

14 



-11- 



educational and psychological measurement over the meaning of intelligence and 
whether and how tests of intelligence relate to it. This confusion was one of 
the reasons why the judge in the case ruled that intelligence tests are biased 
against minority children and illegal to use in placing children in classes 
for the educable mentally retarded ( .Larry P . v. Riles , 1979).* 

Similar confusion surrounds the term and construct of readiness. 
School and reading readiness tests are commonly used in early education pro- 
grams in the United States and have been published in this country for at 
least half a century. Yet there is still little agreement about wh-t consti- 
tutes school readiness or reading readiness. Reflecting this disigreement, 
reading readiness tests vary considerably in the skills they cover. This 
was shown by Rude (1973) in his analysis of five major reading readiness 
batteries to determine which of twelve specific skills were actually assessed. 
The nunber of skills assessed on any one readiness batter/ ranged from three 
to seven. 'Eight skills were assessed on only one of the batteries. The 
only skill that was assessed in all five batteries was letter recognition. 
The main problem w\th all such readiness tests is that one cannot gauge 
their value without specifically addressing the question of readiness for 
what: not just readiness for first grade or for reading, but for what kind 
of first grade or reading. 

Disagreement and confusion over what is meant by reading readiness 
and intelligence does not mean that tests which go by these names are 
useless. It does mean, however, that if one wants to use an early childhood 
instrument for a particular purpose, one should not simply accept an instru- 
ment at face value and assume that it measures 2 construct such as intelli- 
gence or readiness, but should carefully examine the nature and validity 



* The initial decision in the Larry P. case is currently under appeal. 



-12- 



of the instrument in light of that purpose. If an instrument is to be 
used *o help select children for future participation in special programs 
like Title I, then attention should be given to its predictive validity-- 
that is, to how well results will predict children's future performance. If 
the instrument is to be used to evaluate a program, then consideration 
should be given to how well the content of the instrument matches the content 
of the program of instruction. If one wishes to use an instrument to infer 
something about a construct or general aspect of children's development, 
say general reading achievement, then special consideration needs to be given 
to construct validity. In short, different potential uses of a test or 
other assessment device require attention to different kinds of validity 
evidence. 

This point, which is relevant to testing and assessment generally, is 
especially important with respect to assessment of young children, since 
several extraneous aspects of assessment can have a strong influence on results 
for young children. Research suggests for example, that how test instructions 
are ziven to children before assessment can affect results more for younger 
than for older children (Gaffney 5 Maguire, 1971). Also, use of separate 
-machine-scoreable answer sheets can affect test performance of young children 
more than that of older children (Ramseyer 5 Cashen, 1971). Indeed, one 
test expert has advisea: "In testing children below the fifth grade, the, 
use of anv_ separate answer sheet may significantly lower scores .... 
[At lower] grade levels, having the child mark the answers in the test booklet 
itself is generally preferable" (Anastasi, 1976, p. 36). These issues in 
early childhood assessment .ire not, of course, simply assessment problems; 
rather, they reflect important characteristics of child development 

16 



9 

ERIC 



-13- 



discussed above. They can also affect the qualities of testing and assess- 
ment known as reliability and norming. 
Reliabil ity and Measurement Error 

Reliability refers to the accuracy or consistency of measurement. 
Three types of reliability are most commonly treated in the educational 
measurement literature: 

• fattrnal consistency refers to the extent to which all items or 
parts of an assessment measure the same thing 

• Alternate form reliability means the comparative accuracy of re- 
sults From equivalent forms of the same assessment instrument 

• Stability refers to the consistency of assessment results over time. 

(APA, AERA § NCME, 1974) 

Although these three types of reliability have been widely recognized in 
the past, numerous sources of error in assessment are intertwined with 
far more complexity than is represented by just thfese three (Cronbach, 
Gleser, Nanda, and Rajaratnam, 1972). Indeed, when pursued thoroughly, 
issues of reliability or dependability begin to m?rge with issues of 
validity. And like validity, re liability. cannot be treated very sensibly 
in the abstract. 

Some people have tried to rate test reliability independently of test 
use,* but this ignores the obvious point that reliability of assessment is 



iLZl* Jk Clementary ^ sch001 tests ' for example, CSE investigations 
awarded three points to any test reporting an internal consistency coef- 

to 9 T o^L?^ ; 90 ' T P °i ntS " the "ranged from 70 

I P0int lf les , s than - 70 ' »d zero points if no appropriate 
coefficient was reported" L Hoepfner et al., 1976, p.xxix.) Points were 
awarded similarly for test stability and alternaieform reliability 



-14- 



9 

ERIC 



more important for some uses than for others. In general, the mere con- 
sequential is the assessment, the more we ought to be concerned with test 
reliability or accuracy. If a test is to be used to select children for a 
special program of some duration, such as Title I, then reliability matters 
far more than if it is to be used only for a monthly check on children's 
progress. Also, the intended use for a test or other assessment will affect 
the form of reliability evidence that should be considered. If a test to 
be given in the spring as a means of helping to decide which children should 
receive Title I services in the fall, the stability of test scores over time 
would be an extremely important aspect of reliability. For other types of 
use, other aspects of reliability would be more pertinent. 

Like validity, reliability of assessment ..can be more problematic with 
young children than with older ones. Indeed, it poses a special dilemma for 
early childhood assessment. Internal consistency and stability both tend to* 
be lower in assessment of young children than in chat of older ones (Walker, 
Bane ft Bryk, 1973, p. 26; Brooks ft Weintraub, 1976 : p. 39). As noted above, 
young children generally have shorter* attention spans than older children- 
at least for tasks that are not of their own choosing. As a result i: is 
important that assessment tasks for young children be kept short. The. 
problem this raises, however, is that the shorter the test-that is, the 
fewer items it encompasses-the lower its reliability will be. Developers 
of early childhood tests and instruments get around this problem in several 
ways. First, they often organize assessment procedures into several rela-, 
tively short sessions -of only 10 to 15 minutes for kindergarten-aged 
children and IS to 20 minutes for first graders. This can help to avoid 
problems of inattention and fatigue that would likely result from lc^r 

IS 



-15- 



sessions. Second, uany early childhood assessment instruments are indivi- 
dually rather than group administered-which can also help to maintain 
children's interest. Third, assessment tasks can be designed so as to be 

of intrinsic interest to children-indeed, some publishers of early child- 

hood tests suggest that they should be described to children not as tests, 

but as games. 

Norms 

The third aspect of technical quality that should be mentioned is norms. 
Norms represent the performance on an instrument of some sample of persons 
with whom the instrument was standardized or normed. Norms are "empirically 
established by determining what a representative group of persons actually 
do on the test" (Anastasi, 1976, p. 76). A score derived from the test or 
assessment procedure can then be interpreted in terms of the distribution of 
scores obtained by the group who participated in the instrument's norming ' 
or standardization. 

For many early childhood tests and instruments, norms are nonexistent, 
or if available, , are limited" in certain respects. Early childhood instruments 
that are designed to assess children's performance on specific tasks, for ' 
example to help ascertain whether children can do certain things like 
tying their shoes or saying their names, may have no norms. When early 
childhood test norms ao exist, they generally are not based on nationally 
representative samples. 

When early childhood test norms are available, they may be limited 
in other respects. Some readiness tests have start -of -school -year norms, 
for example, but since they are designed as screening instruments to 



-16- 



assess children's status upon school entry, no empirical norms may he 
available for end-of-year performance. This constrast also reflects the 
point, noted above, that young children develop rapidly. A test which, is 
useful with a group of five-year-olds or six-year-oids in the fall may 
siuply not be useful with them the following spring. Also, children's 
performance on early childhood tests can be sharply influenced by pre- 
school or early school experience, which, can complicate use and inter- 
pretation of norm-referenced results. The Comprehensive Test of Basic 
Skills CCTBS) Level A (Form S) . for example, provides two sets of norms 
for the beginning of first grade-one for students who attended kinder- 
garten and one for students who did not. On the alphabet subtest of the 
CTBS Level A, a particular raw score can vary by as much as 40 percentile 
points when interpreted in terms of the two sets of norms ,CTB/McGraw 
Hill, 1974). In similar fashion, children's kindergarten performance can 
be sharply affected by whether or not they have attended prekindergarten. 
This complicates norm-referenced interpretations of early childhood test 
results, because "tKe experiences of the preschool child are les.« uniform 
than those of older children who are attending school" (Broman, Nichols § 
Kennedy, 1975, 0.38). 

In sum, early childhood tests and instruments tend to be of lower 
technical quality than those designed for use *ith older children. Validity 
of assessment of particular attributes of young children may be threatened 
when assessment results are confounded with aspects of assessment procedures 
such as children's skill in listening and ability to follow directions. 
Reliabilities of early childhood assessment instruments tend to be lower 
than those of instruments for older children. And norms for early childhood 



0 



0 



-17- 



instruments are often unavailable or are based on samples of children 
far smaller than those used in norming later grades' tests. Interpre- 
tation of norm-referenced results with younger children is alto complicated 
by the fact that children's preschool or early school experience can sharply 
affect such results. The point that should be stressed, however, is" that 
these qualities of early childhood tests and instruments do not reflect 
technical issues so much as they represent real and important characteristics 
of young children-that they grow and develop rapidly, that aspects of their 
cognitive, social, and affective development interact, and that they are 
not so accustomed to school and the procedures of educational assessment as 
are older children. 



ERIC 



•19- 



XII. OBSERVATIONAL APPROACHES TO EARLY CHILDHOOD ASSESSMENT 
Because of the many factors which can complicate the educational 
assessment of young children, alternative approaches to assessment, with 
alternative strengths and weaknesses, may be useful. These alternatives 
include interviews with children, documentation and recording of their 
educational activities and interests, and talking with parents about their 
children's learning. Such assessment techniques are, of course, nothing 
new. Teachers of young children typically rely upon just such varied 
assessments for a variety of purposes. Thougn most often used informally, 
such approaches can also be adapted to purposes of systematic assessment. 
Experience with large-scale evaluation has shown, for example, that techniques 
such as structured interviews with parents can illuminate aspects of early 
childhood programs which cannot be illuminated directly through traditional 
testing of children (see Haney 5 Pennington, 1978, for an example of how 
analyses of systematic parent interviews were used in this way with respect 
to Project Follow Through}.. 

In this chapter we briefly describe several varieties of a general form 
of educational assessment which is often overlooked, namely systematic observa 
tion. First, we discuss why observation can be an especially useful approach 
to assessment of young children and describe exactly what is meant by syste- 
matic observation. Second, we briefly describe five types of systematic 
observation and an example of each. Third, we describe some of the general 
potential value and thi limitations of observational approaches to early 
childhood assessment. 



-20- 



9 

ERLC 



THE CASE FOR OBSERVATTONAr. APPPnarw ps T Q EARLY CHILDHOOD ASSESSMENT 

Two experts recently summed up the case for using observational tech- 
niques in early childhood assessment as follows: 

Observational measurement is of particular importance in 
early childhood education for three reasons. First, and 
possibly most important, it affords a meins of measuring 
many child behaviors that might otherwise be immeasurable. 
Very young children, say five years and under, have a limited 
response repertoire, and especially if verbal -related. Thus 
they may be unable to make the response or provide the infor- 
mation that a more conventional measure, such as an interview 
or a paper-and-pencil test, may require. Observational 
measurement may offer particular advantage in the affective 
domain. . . . 

A second reason for the appropriateness of observational 
measurement in early childhood education is that young 
children frequently fail to take testing procedures 
seriously. ... 

The third reason relates to the generally held assumption 
that very young children are open and relatively unchanged 
or unperturbed by being observed. 

(Coodwin $ Driscoll, 1980, p. Ill) 
Before describing different types and examples of observational ap- 
proaches to early childhood assessment, let us specifically explain what is 
meant by the term. Observational measurement refers to the systematic re- 
cording of the behavior or other characteristics of children. This includes 
use of checklists, rating scales, and observation scales and many individually 
administered early childhood tests in which the examiner rather than the 
child records children's responses to assessment tasks. Indeed, the fact 
that portions of commercially published ea-ly childhood "tests" such as 
the CIRCUS and the McCarthy Scales (both described in Appendix 4) call for 
the examiner's recording or rating of children's responses is clear testi- 
mony to the importance of not confounding the assessment of children's 



23 



-21- 



characteristics or behaviors with theix test-taking skills in general and 
their skill in recording answers in particular. This point cannot be 
overemphasized. Research has shown, for example, that scores on paper-and- 
pencil tests of children's "self -concept- may correlate more highly with 
children's performance on paper-and-pencil tests of achievement than they % 
do with one another (see Haney, 1977, pp. 319-322, for a discussion of 
just such a pattern of results in the national Follow Through evaluation). 
In. short, paper-and-pencil tests may confound young children's test-taking 
skills with other attributes they intend to measure. For such reasons. 
»any experts (e.g. Walker, 1973, p. 38) have suggested that non-verbal 
observational techniques may be more valid and reliable means of measuring 
many characteristics of young children, particularly non-cognitive ones. 

Dozens of early childhood observation systems are available and many 
of them have been used in a variety of settings and for a variety of purposes 
(See, for example, Beyer, Simon * Karafin's Measures of Maturation: An 
Anthology of Early Childhood Ob l ation Instrt^nr. 1973f described in the 
Notes section, Appendix 1 of this booklet.) In the paragraphs below'we 
de,cribe-five different general types of observation instruments. Also we 
will describe one example of each. Examples are given for illustrative 
purposes-not because they are necessarily recommended for general use. In- 
deed, observational approaches generally will have to be adapted for the 
particular use intended. 
Continuous Records 

It is impossible to observe and record everything ,that goes on 
in any classroom or social setting. Nevertheless, a continuous -record 



ERIC 24 



-22- 



approach to observation attempts to document relevant behaviors of a child, 
or events in a classroom, in a continuous, organized manner, Such 
behaviors can be recorded in narrative fashion or with some sort of check- 
list. Jan« Stallings in her handbook Learning to Look C 1977) describes 
how, as a teacher, she used a narrative continuous record to help under- 
stand and deal constructively with one troublesome youngster: 

Once, in desperation, when I could not understand the be- 
havior of Billy, a particularly disturbing second-grade child, 
I hired a college student to come in and write a running ac- 
count of everything he did for two days. From this, I re- 
ceived sixty hand-written pages of narrative. 

The i information was most valuable. I learned that on the 
first day, Billy had gotten up and wandered about the room 
fifty-seven times. Since the school day was five hours long 
this was about ten times an hour. He had fallen off his 

H^f ^^tu VV' He had P icked his no « "venteen 
times and rubbed his eyes twenty-three times. He had 

received thirteen smiles from me and twenty-seven reprimands - 
mostly to stop falling off his chair **d pay attention 
He initiated conversations with other children forty-four 
f^! S ' u the interacti °n was only one or two sentences 
long. He spoke to everyone who passed his seat and tried 
to trip three people, succeeding twice. He was rejected 
fifteen times by other children who were involved in swe 
activity and was physically pushed away from a group of 
three who were working on a mural. During recess, he put 
a blanket over his desk, took his reading workbook, and 
disappear^undemeath. He stayed there for five minutes. 
The second day's observations were similar, and the picture 
that emerged was one of a hyperactive, highly distractible 
cm id. 

Supported by these specific descriptions, I requested con- 
f«r^ 8S "i** 1 , 1 * 5 V^ents, his doctor, a reading specialist, 
and the school psychologist. The written account of his 
behavior enabled me to present factual information with a 
minimum of inference. As a result of these meetings, an 
educational progTam was planned that helped Billy progress 
in his learning, (p. 9). v * 

Time Sampling 

Continuous recording obviously can be an expensive and time- 



0 



5 



9 

ERIC 



-23- 



consuming approach to observational assessment. An alternative is to use 
a tine-sampling approach, under which observations are made at specified 
time intervals. The key ingredients : of a time-sampling observation system 
are: - ^ - ^ . 

• The behavior or trait to be observed is defined in operational 
terms (specific actions or conditions), """ 

• A time unit of .observation (ranging from as little as one second 
to IS minutes or more) is specified^ 

• A sampling strategy is specified (for example, -observat ions 
might be made for the first 10-minute interval of each -hour '-r 
of the day) . • 

A number of problems arise in applying such an observational strategy, 
of course, but since most of them are common to other observational tecS- 
niques, let us postpone that discussion. Instead we simply illustrate 
this technique by describing an early application of a time-sampling 
approach used by Ruth Arrington (1932, also described in Wright, 1960, and 
Hutt $ Hutt, 1970). Arrington' s research concerned the behavior of young 
children. Her observational system was based on two checklists concerning 
activities which engaged children (use of materials, physical activity or 
no overt activity) and their social interactions (talking with others, non- 
social vocalizing, physical contact, laughing, or crying). These categories 
were defined to be mutually exclustve'Tn terms of overt behavior of children. 
Individual children -w^^observed during free play periods, using five- 
minute-tfbservation sessions during which children's activity engagement and 
social interactions were recorded every five seconds using special checklist 
forms (see Hutt § Hutt, 1970, for an example of Arrington »s checklist forms). 
Checklist records were then analyzed to determine the frequency with which 



ERIC 



-24- 



different sorts of behavior occurred for individual children or different 
types of children. Arrington fowu., for example, that for nursery school 
children, non-social vocalizing was more frequent than social speech, 
and that children tended to converse primarily with members of their 
own sex. 

Event Sampling 

Like time sampling, event sampling can be more efficient than continuous 
recording. However, instead of observing and recording events in terms 
of a prespecified time sample, event sampling focuses on prespecified 
types of events or behavior. For example, such* an approach might focus 
on question-asking behavior of children, or specific types of social 
interaction or their use of a play area. 

Goodwin I Driscoll C1980) describe an event -sampling procedure employed 
in Kounin's U970) study of kindergarten teachers' handling of classroom 
misbehavior during the first few days of school. In this study, the 
focus of observation was teachers' efforts to stop misbehavior, or what was 
called a desist, in addition to this primary event, observers also re- 
corded information concerning the influence of the incident on neighboring 



children. 



When a teacher directed a desist at a misbehaving child, 
the observer recorded what the deviant child had been doing 
as well as activities of the audience (other children looking 
on), the nature of the desist and the deviant Hild's immediate 
reaction, and the behavior for the next two minutes of the 
nearest student witnessing the desist. . .observers waited until 
after the event to reon-d particulars but did so immediately 
afterward to help assure fidelity of memory. 



07 



9 

ERIC 



-2S- 



Subsequen: .analysis and interpretation of the data on the 

S^ $ Zl5!L t !* th -l th r TipPle effect did ' * fac?, occur. 
SoJfTL " lng * dMiSt on the first **y of kindergarten 
TZr a? X# . 0V ?T re * ction on following days. On ?he 
^dentally, they were more likely to behave 

rteXTd^r^* ° r t °. S !! 0W behavior dis «Pti<m after 
4 desist, [fcviancy-linked children showed more con- 

£fa to^'T* 0 ^' ? d * ofboth ^tness- 

roe ifiX , ? d - lc y- f « e <=hild«n, and they were 
more likely to decrease deviancy and increase conformity if 

£th ^.LrT!, hi ?i n ,? rameSS - CUrity of d "ist influenced 

* - g * ° f chlldren in the direction of conformity 
21^ ' S gene 5» 1 ' 4 determiner of the nature of the 
ripple effect than was firmness. Although rough desists 
upset many children, their overall effect on conformity and 
non-conformity was slight. 

. , (Goodwin « Driscoll, 1980, pp. 122-123) 

Trait Rating 

A fourth general type of observational technique is trait rating. 
With this approach, an observer does not directly describe behaviors or 
events, but instead, after observing a child or a classroom for a period 
of time, rates a general trait or characteristic of what was observed. 
A kindergarten teacher, for example, after watching and working with a 
child during the course, of the school year, might rate a child in terms 
of the trait of readiness to begin a particular type of reading instruction 

In one observational study, which was part of the national evaluation 
of project Follow 'Through (FT) , for example, observers were asked to rate 
several dimensions of FT first grade classrooms. Using a Physical 
Environment Information form which was developed as part of SRI Inter- 
national's observational study of FT, observers coded information on 
various aspects of the classroom setting: presence and use of specific 
equipment, intructional Materials, games and toys; whether the classroom 
has movable or stationery tables and chair-, whether children's seating 
is assigned or self-selected, and whether children are assigned to or 



-26- 



select their own gToups (Stallings S Kaskowitz, 1974, pp. 23-25) . 
Subsequent analysis showed that the ratings could be used to discriminate 
reliably between classrooms affiliated with different FT model sponsors, 
and that some of these ratings were significantly related to children's 
later behavior in school and on tests. 
Work Samples 

A final type of observational technique is even less direct than the 
approaches described so far. Instead of observing children's behavior 
or classroom events directly and recording or rating them, this approach 
relies upon the collecting or recording of specimens of children's work; for 
example, drawings or other artwork and written materials. Again, we should 
point out that this form of assessment is by no means anything new. For 
decades teachers of young children have regularly sent children home 
with samples of their artwork and writing, as a means of helping parents 
appreciate what children have been learning. What is not so often recog- 
nized, however, is that such work samples also have potential value for 
systematic assessment. 

Carini (1978) provides an example of this in what she calls 
documentary processes. She points out that the "accumulated work of a 
child in a medium such as writing, painting or blocks can be a focus of 
discussion 11 for teaching staff and parents. She cribes how she 
employs such documentation, as follows: 



23 



27- 



\ 



The first step in the documentation and portrayal of a 
child is to arrange the diverse forms of data— records 
children's work, interviews, etc. —in chronological order, 
if,! ntire record i$ "-read several times and pieces of the 
child's worV are selected for description through a reflec- 
tive conversation. For example, for a child (Misha) for 
whom the motif of houses is pervtsive in stories and drawings, 
a number of reflections were carried out including "hidden", 
'domestic", and "wild". These reflections were followed 
by detailed descriptions of specific pieces of work. 

Immersion in the records and in the work allows themes 
or headings to emerge 

The initial charting is followed by an unspecified number 
of rechartings according to the motifs, mediums and themes 
suggested by the initial exploration. Some of these headings 
are refinements of earlier headings, while others cut through 
the data from' subtler angles than the more global character- 
ization of the data provided by the initial headings 

The last step in the study is the descriptive essay in which 
all of the data is integrated in order to portray the child. 
Stated concretely, the essay reflects the theiratic patterns 
emergent from the records, and employs the particular data 
with*n the records to duuiment those patterns. 

C FT-3-11) 

Carini's systematic gathering and analysis of children's work samples 
together with other sorts of assessment information is quite unusual, but 
she explains that such methods of documenting and portraying children and 
thejr learning can prove extremely valuable. 

To portray the person to those primarily responsible for his 
or her education— teachers and parents— is to increase 
dramatically their capacity to make thoughtful choices 
in the interests of the child's education. At each point 
in the extended process "described above, there is examination 
of setting, teaching practice, and the continuity of the 
child's experience and thought. It is also true that to see 
and know any one child fully is to know all children 
better. The uniqueness of the one calls up his or her 
shared perspectives with particular others, and embeds that 
perspective within the full range of human experience. 

.(Carini, 1978, p. 14) 



ERIC 30 



POTENTIALS AND LIMITATIONS OF OBSERVATIONAL APPROACHES 

As the examples cited above illustrate, there are several different 
sorts of approaches to observational assessment. As the examples suggested, 
these approaches need not rely exclusively on one type of sampling (for 
example trait or time sampling) but instead can combine sampling strategies. 
Also, any one approach is doubtless of limited use. Nevertheless, when 
applied in conjunction with other approaches, observational assessment has a 
tremendously broad range of uses. As illustrated in the examples we cited, 
systematic observational assessment may be of help to the teacher in planning 
instruction for individual children, to the researcher in charting the course 
of child development, to th^ evaluator in assessing the character, processes 
and outcomes of specific educational programs, and to the parent in under- 
standing and promoting the learning of his or her child. 

Nevertheless, observational approaches, like all forms of assessment, 
have weaknesses as well as strengths. First, we should point out that 
the same standards of technical quality pertain to observational techniques 
as to other forms of assessment. One must consider whether such observations 
are valid and reliable and provide a basis of comparison appropriate to the 
intended use. Validity of observations is important because research has 
shown that different observation systems that appear to measure th< same 
sort of behavior can yield different results because of the way observation 
categories are defined or operationalized (Borich et al., 1977). Jane 
Stallings C-977) provides a specific example of the problem of obtaining 
reliable observations: 

31 - 

ERIC 



-29- 



The physical environment of tne classroom— its size, 
shape, lighting, ventilation, and noise level —was 
considered important to the process of educating children. 
We tried to record this kind of information during 
our first two years of observation (of FT projects] but 
found it impossible to get observers to agree on what 
wa' "light enough" or "cool enough" or "quiet enough." 
Therefore, since we could not establish reliability 
among observers, we deleted these items from subsequent 
observat ions . (p . 26) 

Observational techniques for assessment have several other potential 
limitations which should be mentioned. For one thing, these approaches can 
be relatively expensive and time-consuming. Moreover, in order to produce 
valid and reliable measurement, special training of observers often is 
required. For example, before they are allowed to collect data using SRI 
International's Classroom Observation System for research purposes, 
observers are required to attend a seven-day training session and pass 

a criterion test (Stallings, 1977). 

i 

In addition to these practical limitations, observational approaches, 
to early childhood assessment share a potential weakness common to all 
forms of assessment. The danger is simply that in focusing on observable 

j 

behaviors or traits, or on available work samples, it is all to easy to let 
assessment become a goal in and of itself, concentrating on that which can 
be assessed easily, and to ignore broader issues in children's development 
and learning, forgetting the ultimate goal of how to promote that develop- 
ment and learning. 

This chapter has provided only a very brief introduction to observational 
approaches to early childhood assessment. For references on sources of 
further information, see Appendix I. 



■31- 



IV. USES OF EARLY (HILOHOOD ASSESSMENT INSTRUMENTS 

It is difficult to evaluate evidence on the utility of a test or 
assessment procedure without considering the particular use to which the 
test or procedure is to be put. An assessment procedure may be good for 
some purposes b : net at all for others. This point was made in several 
ways in the last two chapters It is a simple notion, but one frequently 
overlooked in discussions of the technical quality of assessment pro- 
cedures. Hence this section surveys alternative uses of educational as- 
sessment, and discusses them in light of the special issues of testing 
and assessment of young children. 

First, however, we need to decide, how to divide up the set of potential 
uses of assessment information fc : the sake of discussion. There are 
several ways one could do this. One reasonable w*y, suggested by a recent 
NIE report on testing, divides assessment use into four broad categories: 

• To hold teachers, schools, and school systems accountable 

• To make decisions concerning indi vidual students 

e To evaluate educational innovations and experimental 
pro j ects 

e To provide guidance to teachers in the classroom . 
♦ CWhite 3 Tyler, 1979, pp. 7-8) 

In the following pages we will discuss the special considerations bearing on 
use of early childhood tests and instruments for these four categories of 
use. Since us? of assessment information for program evaluation is particu- 
larly salient with respect to Title I, we will discuss this type of use last, 
and in more detail than the others. 



9 

ERIC 



33 



•32- 



Also, since the distinction between norm-referenced and criterion-refer- 
enced assessment is relevant to types of use, let us spell out what is meant 
by these two terms. A norm-referenced test or assessment is designed to com- 
pare an individual's performance to that of others called a norming group or 
standardization sample. Criterion-referenced assessment is designed to com- 
pare an individual's performance not to that of other individuals, but to 
some other standard, such as a prespecified criterion score, or a domain of 
items or type of behavior. The distinction rests upon how assessment instru- 
ments are designed, not on how they are interpreted, since any test or assess- 
ment results can be interpreted in either norm- or criterion-referenced fashion. 
Thus one always should look beyond the labels of "criterion-referenced" and 
"norm-referenced" to investigate the content of an instrument and the exact 
manner in which it has been developed. 

Assessment pro edures for young children, for example, often are normed 
in terms of age rather than of grade level . Perhaps the most famous example 
of age-normed assessment of young children is Dr. Spoc.k's The Common Sense 
Book of Baby and Child Care. The practice ,of age norming assessments of 
young children reflects two points noted earlier. First, before school entry 

Q 

the social and educational experiences of young children are diverse-hence 
there is no social or educational experience sufficiently common to most pre- 
school children to provide a basis for norm-referenced test interpretations. 
Second, the age norms available for young children reflect the rapid devel- 
opment and change of children in their first five or six years of life. 
Gesell, Ilg and Ames's (1974) Infant and Child in the Culture of Today , for 
instance, provides behavior norms for the following ages: 4 weeks, 16 weeks, 
28 weeks, 40 weeks, 1 year, 15 months, 18 months, 2 years, 24 years, 3 years, 
3H years, 4 years, Ah years, S years, SJj-6 ye'ars. The exact ages at which 

e 34 

ERIC 



•33- 



certain behavior may be manifest will of course vary considerably with both 
• individual characteristics and environmental influences, as Gesell et al. 
point out repeatedly. This variability is what makes the use of norms with 
young children so difficult. Research clearly suggests that not until around 
age nine (grade 3) has as much as 50 percent of the general achievement pat- 
tern at age 18 (grade 12) been developed (Bloom, 1964, p. 10S). In other 
words, patterns of educational achievement are far more variable in the early 
childhood years (below grade 3) than in later years of schooling. From the 
assessment perspective, this suggests-** we said earlier-that measurement 
of young children is more difficult than that of older ones. Yet from an 
educational point of view, this finding has also been viewed as an opportunity. 
The great variability in young children's achievement and behavior has con- 
tributed to the theory that early childhood is a critical period for inter- 
vention-^ time in which relatively minor alterations in environment can have 
immediate or long-term development consequences (White et al., 1973). But 
whatever its implications for educational practice, such variability makes 
norm-referenced interpretations of young children's performance particularly 
difficult. This in turn has implications for alternative uses of assessment 
information. 

ADMINISTRATIVE AND PUBLIC ACCOUNTABILITY 

As a recent NIE conference report on testing noted, educational assess- 
ment is used for a variety of accountability functions: 

Many principals, superintendents, and other education author- 
ities use test scores, particularly scores on achievement 
tests, as a rough gauge of the adequacy of the performance 
of a teacher, a school, or a larger administrative unit. 
Parents, voters, and legislators also use such information 
••n judging schools and school systems. The results of a 
test are taken to indicate the amount of learning accomplished 
by the average student in a classroom or larger unit. 

(White a Tyler, 1979, p. 7) 



9 

ERIC 



3 



rr 



/ -34- 



♦ 



9 

ERIC 



As this account suggests, there are two major strands to the accountability 
functions of educational assessment-one for administrators and others 
direct* and explicitly responsible for educational programs, and the other 
for parents and the public generally, who ultimately hold the authority for 
public education in the United States. The role of systematic educational 
assessment in both forms of accountability appears to be on the increase. 
In terms of administrative accountability, more and more educational programs 
require assessment of one sort or another. This is of course often tied to 
program evaluation functions, which will be treated later in this chapter. 

The public accountability function of assessment, particularly standard- 
ized testing, has a longer history than formal program evaluation. Though 
testing has been explicitly tied to formal educational accountability schemes 
in recent years, test results have long served as a prime means by which 
the public judges the quality of schools. In some cities, newspapers have 
long published test results school by school. Real estate agents sometimes 
cite schools' test results to prospective buyers to entice them to buy homes 
in particular neighborhoods. Parents often are informed of their children's 
educational status in terms of test results. 

In all such public accountability uses of educational assessment, there 
appears to be a strong tendency to rely on normative comparisons. One 
school's test results are compared to those of other schools. People want 
to know not just how many scholarships were awarded to seniors in high 
school A, but whether this was more or less than in other high schools in 
the area. Parents often want to know not just whether Johnny is doing 
okay in school,, but how he is doing with respect to his peers. Desire for 
normative comparisons appears to be one important reason for the continuing 

36 



-35- 



prominence of norm-referenced tests in educational assessment. One large 
city school superintendent, for example, was publicly asked why his schools 
continued to employ norm- referenced tests, despite the fact that they had 
developed an elaborate system of criterion-referenced assessment. He replied 
that the majority of taxpayers in his district, who do,not have children in 
school, were not familiar with nor understood criteri/n- referenced results. 
"We show them norm- referenced results," he recounted, "to demonstrate the 
validity of what we are doing" (Haney, 1978, p.S). 

This tendency in the accountability function of educational assessment 
probably also helps to explain the continuing use of grade-equivalent scores 
in American education. Expert in educational measurement have long warned 
against grade-equivalent score, because they are often misunderstood and 
misinterpreted (APA, AERA 5 NCME, 1974). Nevertheless, at least until 
recently, schools continued to rely heavily on grade -equivalent scores 
because they provided a familiar means of educational accounting. Grade- 
equivalent scores, despite serious problems of frequent misinterpretation, 
seem to remain popular simply because, as one observer recently put it, 
people think they understand what these scores mean, even if they do not. 

These issues have implications for the use of early childhood assessment 
results for accountability functions. First, because of the limitations, 
or in many cases the nonexistence, of early childhood norms, it may be hard 
to report and interpret early childhood assessment results for public con- 
sumption. For children aged 3 to 4 or younger, age norms may provide a useful 
framework for interpreting assessment results. Yet by age 4 to S, when 
children typically experience their first formal schooling, use of age norms 
becomes more hazardous. As we noted in Chapter 2, early educational 



-36- 



experience can sharply affect young children's educational performance. 
Unless this is taken into account, assessment results may inadvertently re- 
ject the presence or absence of such experiences. Hence, normative com- 
parisons, commonly made for accountability purposes at later grade levels, can 
be extremely difficult, if not altogether impossible, to carry out in a reasonable 
way at the early childhood level. One way to get around this problem is to 
report assessment results directly in terms of the assessment tasks employed-. 
for example, instead of reporting novmatively that children scored at the 70th 
percentile on a letter recognition test, to report in criterion- referenced 
fashion that 75% of them could recognize at least 20 letters of the alphabet. 

A second and related issue in accountability uses of early childhood 
assessment has to do with the object of accountability. When high school 
. students cannct read, or conversely, when they win numerous scholarships to 
college, this clearly reflects something about the schools thev attend. Yet 
when young children, say in kindergarten, lack certain skills or are proficient 
in particular ways, it is often unclear to what extent this should be attri- 
buted to educational programs, to children's home and family background, to the 
particular characteristics of the children involved, or to other factors.- 
In short, the potential use of early childhood assessment for general 
_ accountability purposes-at least in ways traditionally used with standardized 
test results-seems to be somewhat less than that of assessment of older 
children. While there is not much good evidence on this point, several 
aspects of early childhood testing and assessment make this contrast plausible. 
Twa points, discussed above, are: 1) the various problems of using norms 
with early childhood tests, and 2) the intertwined responsibiiities of 
educational institutions and home and family for the early educational 



ERIC 



38 



-37- 



development of young children. This suggests that alternative approaches 
to accountability may be useful at the early childhood level: approaches which 
seek to describe children's educational performance directly, rather than 
assessing it normatively or attributing causes for the performance. 
MAKING DECISIONS CONCERN ING INDIVIDUAL CTnnPxrrg 

Assessment results are also used to inform a range of decisions on indi- 
vidual students. At school entry they are used to help determine, whether 
children are ready for reading instruction, or should be placed in special 
classes for the retarded or the gifted. Later in children's education, te< 
results may be used to determine their eligibility for special ^g^amT^ch 
as T^itle I, and to assign them to different curricului^acks in high school. 
Lat^r still, m college or in Jhe^labor market, assessment results may affect 
tssion or hiringwd^rcSmion decisions. Thus educational assessment plays 
a pjart in cUeiifions about individual students throughout their educational 
working careers. 

Note that we are referring here only to major decisions concerning 
educational placement, promotion, and admissions -not to the shorter-term 

less formal decisions, such as instructional guidance, which will be dis- 
cussed separately in the next section . Nevertheless> eyen ^ we 

attention to major educational decisions, the use of assessment results appears 
to be increasing. Kithin the last few years, for instance, 
have begun competency testing programs to control grade-to-grade promotion or 
to provide a basis for awarding high school diplomas. 

These practices raise several issues. In selection decisions for college 
or jobs, the use of tests has traditionally been justified by demonstrations 



9 

ERIC 



30 



.38- 



that they have predictive validity-for example, that a college amissions 
test could predict student grades in college, or that job selection test 
results correlated with actual Job perforce. In the past, much has been 
written on issues of predictive validity, and particularly on bias in selection 
tests in terms of differential predictive validity.* 

In the past few years, however, discussions on the use of assessment re- 
suits for making analogous decisions about students at earlier levels in the 
educational system have taken a somewhat different direction. Rather than 
worrying simply about how well assessment results predict the performance 
of those selected for special opportunities, .eople involved in making de- 
cisions on selection and assignment of younger children have become more 
concerned with the consequences of selection, for those not selected as well 
as for those who are. In special education, for example, concern for both the 
negative and positive consequences of selecting and not selecting children 
tor special programs has prompted enthusiasm for mainstreaming-that is, the 
integrating of children with special needs into regular classrooms, instead 
of segregating them in separate classes and possibly thereby stigmatizing 
them CHobbs. 1975; Wolfensberger, 1972). Also, in the recent literature on 
competency testing, doubt has been raised about consequences of using such 
, tests to promote students from grade to grade or to make them repeat a grade. 
In this light critics ask not just how well tests predict how children will 
do in the future, but how well they match what children have been taught in 
the past, and how much they help to improve what they learn in the future. 

These views on use of assessment results for making decisions about 
individual students have special relevance at the early childhood level. 

See the Journal of Educ ational Measureme nt i9 76 VollM1(t e 
articles lnd references on this topici : ' £or some 2 ood 

o 40 

ERIC ■ 



o9- 



Readiness testing, for example, has a long tradition in early childhood 
education in America, but commonly used readiness tests actually cover 
very different sets of skills. Research suggests, too. that contrary to common 
opinion, separating "unready children into transition classes, for example 
" special kindergarten/ first-grid* cUsses, may „ot enhance their learring 
(Leinhart, 19 80). 

Before one can sensibly assess which readiness test to use, one oust ask, 
readiness for what? The appropriateness of a given test to inform placement 
decisions will vary depending on the educational piograms concerned. Also, 
the practical problems of assessment with young children described in 
Chapter 2 all c*uti « against over-reliance on test results in making major 
decisions about young children. Because of these considerations, the follow- 
ing guidelines, widely accepted with respect to test use generally, are es- 
pecially pertinent to the use of assessment results in making decisions 
^bout young children: 

* 15*2 Sh ° Uld c f sider m ™ ^an one variable for assessment, 
and the assessment of any given variable by more than one method. 

* Jh! e ^ er ' i" i nt ! r P r6t J in * « gained score, ihould consider 
the total context of testing before making any decisions (including 
the decision to accept the score) . g 

* J JJJ* user should consider alternative interpretations of a given 

(A-'A.A^RA 5 NOffi, 1974) 
These guidelines serve to reemphasize the point notci earlier. Instead of 
relying simply cn one form of assessment for making decisions about educational 
placement of young children, one should take into account alternative forms 
of assessment. j 



-40- 



GUIDANCE TO TEACHERS IN THE CLASSROOM 

A third class of uses of systematic assessment is to guide instruction- 

that is, to provide information and feedback to teachers as opposed to 

informing major administrative decisions. It is in this domain of use that 

early childhood assessment appears to be potentially most useful; at least 

this was suggested by a recent nationwide survey asking teachers how they 

used standardized achievement test results in their classrooms. More; than 

SO percent of responding kindergarten to grade 4 teachers replied t* 4t they 

used test results in only four of the ways the survey sugg ned: 

Diagnosing strengths and weaknesses 77% 

Measuring student growth 71% 

Individual s:udent, evaluation 65% 

Instructional planning 52% 

(Beck ft Stetz, 1979, Table 4) 
Nevertheless, though tests appear to be relatively useful in guiding 
instruction, some observers have been highly critical of their use- 
fulness for this purpose. The recent NIE report Testing. Teaching and 
Learning, for example, recounted the following: 

Several national educational groups have called for a 
moratorium on testing. It is argued that standardized 
tests have no positive direct usefulness in guiding 
instruction, and tjheir indirect influence— implicitly 
laying down goals and standards— disrupts or blocks 
teaching. Despite inclusion in the published tests of 
various subtests to identify a student's strengths and 
weaknesses, critics say the categories are 40 broadly 
defined, the t*sts are given so infrequently, and the 
time from test administration to report of results to 
teachers is so long that tests do not help teachers in 
their work. 

(White ft Tyler, 1979, pp. 9-10) 
Such criticism suggests several characteristics that may make assessment 



42 

ERIC 



-41- 



result, acre useful in guiding instruction. First, they must be relevant 
to the goals of instruct ion-they must have what we called instructional 
validity in Chapter 2, Second , tney ^ provide ^ 

information on particular aspects of student learning. Third, they must 
provide feedback to teachers within a short time. 

The first two characteristics-instructional validity and specificity 
of results-are two of the prime concerns behind the growth of interest 
in criterion-referenced testing within the last decade, m his recent book 
criticizing norm-referenced testing and advocating criterion-referenced 
testing. James Pophaa. for example argued as follows: 

* ^ e "* v « *«wality in norm-referenced achievement 

is llzl'S Va° Jf 1 "? 0 *" 12 "* -ismatches between what 
is tested and what is taught. 

# 22^2??? 01188 available norm-referenced 
test results to remedy ineffective instructional programs. 

(Popham, 1978, p. 84) 
These concerns are obviously relevant to the use of early childhood assess- 
ment to guide instruction. If it is to be so used, assessment must 1) be 
mtched to the goals of instruction, that is, have instructional validity; 
2) provide specific information on individual children's strengths and 
weaknesses; and 3) allow rapid feedback of that information. Several of 
the special issues of early childhood assessment bear on these considera- 
tions. Readiness tests, for example, cover a range of skills that often 
are included within the goals of early childhood instruction; but as noted 
in Chapter 2, different readiness tests cover very different sets of such 
skills. In terms of match with goals of instruction, early childhood 
assessment is particularly weak in one area: social and emotional develop- 
ment, an important domain of early childhood instruction. As Walker noted 

43 



-42- 



in her book Socio-emotional M-tsure s f or Preschool and Kindergarten Children 

Very few (such] instruments have adequate standardi- 
zation norms that are representative for a wide range /' 
of children of varying ethnic groups, intelligence 
levels and socio-economic backgrounds. Generally the 
ones that do exist are very poor and inadequate since 
they are based on extremely small or narrowly defined 
populations of children. 

(Walker, 1973, p. 37) 
It is because of the weaknesses of paper-and-pencil measures of children's 
socio-emotional characteristics that Walker suggests the potential value 
of observational techniques of the sort described in Chapter 3. 

In at least one respect, however, early childhood instruments may have 
more potential than later-grade tests for providing information useful in 
guiding instruction. As noted in Chapter 2, many early childhood assess- 
ment instruments are individually rather than group administered. When 
they are individually administered by the classroom teacher, she or he 
gains specific information immediately, even before scoring the test. If 
the information is keyed to particular goals of instruction, it can be 
immediately useful to the teacher in planning instruction. Thus, for the 
purpose of instructional guidance, earl/ childhood assessment when keyed 
to instructional goals and administered individually appears to have more 
potential utility than group administered tests, the results of which may 
not be available to teachers until woeks after the said tests are given. 
EVALUATION 

A fourth class of assessment use is for evaluating educational pro- 
grams and innovations . It is probably in this area that there has been 
the greatest increase i.i systematic educational assessment within the last 
two decades. As the NIE report Testing, Teaching and Learning put it: 

u 



-43- 



use of st„d*rdLS SS2LST15° j T" through 

of experiaental projects \2T s : A recent wav « 

noveaent in science 22 -T5 the 4 curriculum reform 

when iddespreed effort, 2!I J"? in tha 1960 ' s 
education of cMltaJreTJ.^ ° ifflp^ov • th « 

Projects continue to SSSI S t J ^JT"?"^ 1 
tests to project objecTi^s" ^f* the tMk of etching 
■enters hive found iJSlSf: / n / one c ««. «*peri- 
Projects end hev^^d^eT^^^ t0 

(White ft Tyier. I9 79. p. 8 ) 

* " " couat — ' - — - - tMts 

t.«, ta pr0ITM , valultion 
— . Crver. 1974; Fophmi H78; ^ >t ^ ^ 

o oi cr*in. t . , m _ g indlyldMl t>jt tikfrj ^ juch 

- ». con,t.,ted ^ iMmjltive tQ <ff<ctj of 
loci ecnool ^ My h . v . ilf£wtnt cut1cuu ^ m ^ 

lnC "" ln ' ly b - ln « ^ " ^"ion.! pro,™ „ a t0 Juid . 

H=«. v „. prtel ,. ly o( th< w ^ ^ ^ 

--referenced te.t, tM4 t0 , lMTOjitlvi m ^ injtructioMi effe ^ 

of P»rticul.r education.1 promms H.„,. 

Pr0gT *"- Hen " »« tXPMof t.ete «r. required 
for the purpose, of pro.ro ev.lu.tion. 

More .«ro. critic of nom-referenced teet. K.v. mmiti this 
•rr-ent; tHev predict th.t the of ^ ^ 

-Her in . „„ period of ecuction.! „ the crtt . rio „, r 



-44- 



measurement era" (Popham, 1978, p.2, emphasis in original). More moderate 
observers have suggested merely that curriculum-sensitive tests can play 
an important role in program evaluation, even though norm-referenced tests 
may continue to pity a valuable role in comparisons of the educational 
outcomes of programs that emphasize different aspects of instruction (Madaus 
et al., 1979). ^ 

These and other criticisms have frequently been leveled against 
recent efforts to evaluate program impact. Five of the most common made 
with respect to early childhood programs are the following: 



e 



There is often a real mismatch between the broad goals of 
early childhood educational programs and narrow test -based 
evaluations of them. 

e There is often a great discrepancy between the long-range 

i° n ,L~ fBXly , chi : ldhcod Programs (e.g., to prepare children 
to learn more in later schooling) and the short-term nature 
of most impact evaluations of them (e.g., end-of-proe 
test scores). -re-" 

e There has been a widespread failure to adequately describe 
the educational programs being evaluated, and to determine 
whether or not the program ostensibly being evaluated 
actually was implemented as intended. 

• Most impact evaluations of early childhood programs yield 
few if any clearcut findings. 

• Because of these problems among others, few impact evalua- 
tions provide information which is of much direct use in 
decision making or in improving programs.* 

These criticisms obviously raise issues well beyond the mere use of tests 

in evaluating early childhood educational programs. Indeed, for that 



For more information on such criticisms with respect to past evaluations 
ot early childhood educational programs, see Haney et al., 1978, pp. 32-46 



4C 



9 

ERIC 



■45- 



reison, and because impact evaluation encompasses far more than singly 
testing and assessment, a range of issues in early childhood program 
evaluation is tre,t,d separately in other resource book, in this series. 

Nevertheless, several points should be made here with respect to 
using early childhood assessment instruments for program evaluation. First 
and foremost, the degree of match between early childhood programs and 
the test or tests used to evaluate them aust be considered.. While a case 
certainly can be made for testing aspects of children's development that 
are not encompassed in program goals, this should not be done inadvertently, 
for unintended mismatches between program goals and test content may 
affect evaluation results in misleading ways. This is especially true 
of early childhood assessment, where some common goals-for example, in 
the social and emotional domain-cannot be measured well with available 
tests, and in particular with paper-and-pencil tests. For this reason, 
observational techniques like those described in Chapter 3 may be especially 
valuable for early childhood program evaluation. 

Second, since the main aim of educational evaluation, « opposed to 
educational research, is to inform decision making and to improve educa- 
tional programs, one should closely consider the exact purpose an evalua- 
tion is to serve before selecting an assessment instrument. This applies 
not only to the content of instruments, but also to their form and to the 
way in which results can be derived from them. At the early childhood 
level, as we suggested in Chapter 2, the form of assessment (e.g., whether 
individually or group administered) can significantly affect results. In 
terms of how test results are reported for purposes of program evaluation, 



••6- 



one should be particularly careful with norm- referenced results. This \ 
caution is especially important at the early childhood level, because 
of the problems of norming tests with young children. The previous 
educational experience of children tested must closely natch that repre- 
sented in the norm group, or else norm-referenced results can be badly 
misleading. 

In summary, if early childhood tests and instruments are used in 
evaluations of early childhood programs, one must give close attention 
not only to the special issues of early childhood assessment, but also 
to the particular goals of the programs to be evaluated. 

USING EARLY CHILDHOOD ASSESSMENT FOR MULTIPLE PURPOSES 

Assessment can serve many different functions. In this chapter, 
we have reviewed four different classes of such functions, dealing with; 
e Accountability 

e Making decisions about individual students 

e Providing guidance for instruction 

e Program evaluation. 
As we pointed out at the start of the chapter, these distinctions are 
somewhat arbitrary. Program evaluation, for example, often serves 
accountability functions, and sometimes provides useful guidance re- 
garding instruction, if not for individual students, at leaat regarding 
instructional practices at the classroom, school, or district levels. 

Nevertheless, though they sometimes overlap, it is clear that different 
functions may require different forms of testing and assessment -or at least 
that different functions may pull assessment in different directions. For 

4S : • 



,er!c 



-47- 



soa. instructional purposes, for exanple criterion-referenced resuUs, work 
faaple, or other observational approaches nay be »ore useful than nora- 
referenced results, but for accounting to the public, nora-referenced results 
soaeti.es «ay be acre useful. For soae kinds of program evaluation, one 
might for technical reasons choose an assessment that yields results in the 
fora of a standardized aetric. whereas such a aetric aight be totally useless 
for public accountability functions. For purposes of evaluating social goals 
of early childhood prograas or for research reasons, observational approaches 
may be especially useful even though they aay prove cuabersoae or too expensive 
for other purposes. Such contrasts suggest that different types of assessaent 
should be used for different functions, or at least that, if one assess- 
aent is to serve different functions, it aay have to be used in different 
foras, or the results reported in different ways. 

These points will be summarized in Chapter 4. Here, let us briefly 
describe two issues that are relevant to systeaatic assessaent for 
any function: the related issues ef test bias and assessaent with children 
who do not speak English as their first language. 

Cultural Bias 

Many people believe that standardized tests are biased against black 
and other ainority children. A recent incident highlighting this concern 
was the finding by a federal court judge that standardized intelligence 
tests used in the state of California are racially and culturally biased 
and discriminate unfairly against black children. Ruling that intelligence 
tests have not been validated for the purpose of essentially permanent 
placeaent of children into educationally dead-end, isolated, and stigmatizing 



a 4<3 



ERIC 



-43- 



class.s for the so-called educable mentally retarded, the court enjoined 
the state of California from using IQ tests to place black children in such 
classes (Larry P. v. Riles , 1979, pp. 3-4). 

This ruling, though it applies legally only within the northern 
judicial district of California, nevertheless clearly highlights the wide- 
spread concern that standardized tests may be biased against minority children. 
There remains disagreement, however, over how to tell whether or not a parti- 
cular test is biased. Different experts have proposed different definitions 
of test bias and different statistical methods for detecting fair use of 
tests (see Flaugher, 1978, and Petersen ft Movick, 1976, for a review of 
these two issues respectively) . 

In light of the continuing debate over test bias, it is hard to propose 
specific remedies for this problem. Nevertheless, two general suggestions 
are appropriate. First, in selecting any test or assessment procedure for 
use with young children, one must consider whether its content and form are 
appropriate to the children's culture and backgrbunJ. Second, statistical 
analyses of test results may be irrelevant to issu\s of test bias if they 
ignore how assessment results actually are used. l\ other words, likj/ 
validity, bias cannot be clearly determined in the abstract without 7 
taking into account how and with whom the assessment is to be used. 
Language Considerations 

A particular form of the general probl e m"oT^ultural bias in assessment 
is the issue of assessment of children whose native tongue is not standard 
English. This problem has most often been discussed with respect to Spanish- 
speaking children, but is obviously relevant to any children who do not 
speak standard English as their native tongue. 

50 •" 



-49- 



TTiere is growing awareness of the importance of bilingual education 
for such children and this awareness often extends to include concern for 
culturally sensitive assessment of children who do not speak English as 
their first language. Many people, ftr example, now recognize the importance 
of conducting assessments in the native language of the child if valid con- 
clusions about his or her general educational development are to be drawn, 
■mere is not space here to treat issues of bilingual assessment in any 
detail (see Padilla, 1979, for a good recent survey of the literature on 
testing of Hispanic Americans). Nevertheless, a few general points can be 
mentioned. First, it is important to distinguish linguistic or cultural 
differences from other educational attributes, lest they be mistakenly 
interpreted as some sort of general learning deficit. Second, -v en when 
assessment is carried out in children's native tongues, the results of such 
assessments cannot be interpreted as being equivalent to those of English- 
language assessment; that is, merely translating a test into Spanish does 
not mean that its results with Spanish-speaking children are equivalent to 
results from the English version with English-speaking children. Third, 
issues of assessment with children who do net speak English as their native 
language must be viewed in light of the purposes of assessnent. Using an 
English-language test with such children may be appropriate if the goal is 
to guide English-language instruction, but quite inappropriate if it is to 
measure children's general reading or math achievement. Fourth, although 
the problem of cultural bias in written language tests is widely recognized, 
it is often overlooked that assessment which relies on pictures may carry 
a problem of cultural dependency as great or even greater. Anastasi (1976), 
for example, argues: 



51 



•so- 



•••an item requiring that the names of the seasons be 
arranged in the proper sequence would be more appropriate 
in a cross-cultural test than would an item using pictures 
• ?• seasons - seasons would not only look different 

m different countries for geographical reasons, but they 
would also probably be represented by means of convention- 
alized pictorial symbols which would be unfamiliar to 
persons from another culture. (p. 347) 

We have devoted special attention to the issues of test bias and 
assessment with- children who do not speak English as their first language 
because these issues are particularly pertinent to early childhood testing 
and assessment. As we noted in Chapter 2, y oung children's performance 
on tests and other assessment tasks is easily affected by extraneous 
factors, including aspects of culture and language. Thus at the early 
childhood level, one needs to be especially attentive to potential cultural 
and language bias, regardless of the specific uses for which assessment 
is intended. How to minimize such problems is, of course, itself a problem. 
Nevertheless, in the next chapter we will offer some practical suggestions 
on how to deal with these issues in selecting and using early childhood 
tests and instruments. 



ERIC 



-Sl- 



V. SELECTING AND USING EAt.LY CHILDHOOD ASSESSMENT INSTRUMENTS 

Given all of the potential problems in the testing and assessment of 
young children, how can one sensibly go about selecting and using an early 
childhood test or other assessment instrument? This is the question addressed 
in this section. We treat the question in three parts: screening pot ential 
instruments; trying out likely ones; and finally, using and interpreting 
results. The suggestions are often fairly general, for the simple reason that 
successfully selecting and using an early childhood test or observational 
instrument for any particular purpose will depend to a great extent on the 
specifics of that purpose, the conditions of assessment and the care with which 
results are considered and interpreted. 
SCREENING POTENTIAL INSTRUMENTS 

The primary points ? o consider in selecting any early childhood assessment 
device can be labeled as simply purpose and people. The first thing to 
consider is the exact purpose for which one intends to conduct an assessment. 
As noted in the last chapter, one assessment device may be good for some uses 
but altogether unsatisfactory for others. The second thing to be kept in mind 
is people: is the assessment procedure appropriate for use with the type of 
people-young children-with whom it is to be used? For example, group 
administered tests generally have lifted validity for use with children below 
the age of six or seven. 

With these points in mind, one should screen potentially useful instruments, 
Appendix 2 lists over one hundred early childhood test instruments and observa- 
tion systems together with sources of additional information on each. This lis*, 
is provided simply to illustrate types of instruments and give sources of further 
information. The fact that a particular instrument is listed should not be 
taken to mean that it is endorsed for any particular purpose, and the fact that 

ERIC 53 



an instruct is not list* should not be taken to mean that it ought not 
bo considered. 

Scrooning of potentially ustful instruments can be conducted efficiently 
In two stapi. As an initial step, ont ntads only to rtview basic descriptive 
information for instruments that seen potentially useful. Examples of in- 
formation for such initial screening are given in Appendix 3. As suggested 
in this appendix, initial screening of instruments can be accomplished simply 
by examining five characteristics of potentially useful instruments; namely, 
the type of instrument, the use intended for it by the publisher or developer, 
the population for which it is intended, its format, and its content. 

Candidate instruments which seem potentially usefu* in terms of these 
characteristics can then bt subjected to a more intensive review. Specifically 
one should screen potentially useful tests and instruments with re-pect to 
four categories of information: 

e General information regarding the type and intended use oi the 
instrument 

e Theory, construction, and development of the instrument 
e Practical requirements of the instrument 
e Technical qualities of the Instrument. 

Ttble 1 outlines the kinds 0/ specific information under these categories 
that ought to be considered in choosing assessment instruments. Appendix 4 
provides some examples of detailed instrument reviews in format (again, 
however, the fact that particular instruments are reviewed in Appendix 4 
should not be construed as an endorsement wf them) . Here let us simply 
explain the sort of questions which should be addressed with respect to each 
category of information, and why such questions are important. 

As in making an initial review of early childhood assessment procedures 
which might be adopted, several sorts of genera ? informati o n , need to be 

er|c 54 



-53 



Table 1, 



Outline of Information fo r Screening Assessment In strument* 



Title: 

Developer or Author(s): 

Source or Publisher: 

Copyright date or date of development 

Price: 



*• General Descriptive Information 

Type of instrument 
Intended use 
intended population 
Format 
Content 

11 • Ihfoiy. Contraction, and Development of Instrument 
frhen* and how iantrument vas developed 
Maimer irf w.iich items or assessment risks were selected 
Population or program for which instrument was developed 

III. Practical Requirements 

Materials required 

TVP* of administration 

Time and setting for administration 

directions % v 

Sample questions 

Scoring procedures 

Language of administration 

Training needed to administer 

IV. Technical Information 

Norms or other standards of comparison 

Scales and scores 

Validity 

Reliability 

V. Outside Reviews 
Published reviews 

Opinions of others who have used the instrument 

VI Consents 
General 

Theory, construction and development 
Practical considerations 
Technical qualities 

VII. * References 



-S4- 



considered in making a detail td instrument review. What type of instrument 
or procedure is it? For what, types of use and populations was it developed? 
What is the format and content of ihe instrument? In considering answers to 
these questions, one needs of^course always to keep in mind one's own intended 
purposes for undertaking eariy childhood-*asses3ment. 

With respect to the ory, construction, and development of assessment in- 
struments, one needs to ask whether each of these aspects of an instrument 
is reasonable and compatible with the use to which one wants to put it. *f 
an instrument was constructed in terms of ? specific psychological theory 
which seems irrelevant to the intended use, then one may want to reject it. 
If a test was constructed so as to discriminate between individual test takers 
regardless" of their educational background, it may not be terribly useful in 
evaluating a particular educational program. Finally, if the intended use 
for an instrument corresponds to one of the uses which the instrument devel- 
oper or publisher intended, then one can probably have more confidence that 
the instrument is a reasonable choice. 

In terms of practical requirements , one should consider the accessory 
materials available with the instrument and the requirements for administer- 
ing and scoring it. Tests which allow marking of answers directly in the 
test booklet, for example, are almost always more appropriate for use with 
early elementary school children than are those in which answers are marked 
on a separate answer sheet. Similarly, observation scales or individually 
administered tests in which an adult records children's responses generally 
.re more appropriate for young children who have not mastered clerical test- 
taking skills. Also, tests or assessments which can be administered in short 
sessions of .J to 20 minutes are generally preferable to those which require 
longer administration periods. The exceptions, of course, are instruments 

56 



•ss- 



that are individually administered and allow some flexibility of administration 
to help oaintain children's attention, and observation instruments that do not 
intrude directly on the children's activities. Scoring requirements may also 
influence whether or not a test is useful for a certain purpose. Tests that 
can be hand-scored by the teacher may be more useful in providing information 
for instructional guidance than those which are machine-scored and returned 
to the teacher only after delays of a week or TO re. On the other hand, however, 
instruments which are not scored simply right/wrong, but which entail some 
judgment and interpretation in scoring, may require training for those doing 
the scoring. 

In terms of technical quality, one should consider the available evidence 
on the validity and reliability of the instrument and the characteristics of 
norms provided with the test, if it is norm-referenced. If the test is to be 
used for program evaluation, for example, one must carefully review its con- 
tent in light of the goals and objectives of the program. Although several 
people have suggested schemes for assesring the degree of match between test 
and program (e.g., --Porter et al., 1978; Hambleton et al., 1978; Walker et 
al., forthcoming), such specific procedures will not be equally relevant for 
all assessment purposes, in most cases, however, a test will be appropriate 
the more its conten* covers program content, and the less it covers material 
irrelevant to the program. 

If an instrument is to be used for selection purposes, different sorts 
of validity evidence will need to be considered. If the goal is to select 
for special services children who are likely to have difficulties in later 



9 

ERIC 



57 



-56- 



t 



schooling, one needs to look for evidence that the instrument has predictive 
validity-that its results may be useful in predicting later school achieve- 



ment. 



Reliability evidence likewise should be examined in light of the assessment 
purposes one has in mind. If a test is to be used to help in making decisions 
about individuals, one needs to be far more concerned about reliability 
evidence than if it is to be used merely as an indicator of group or school 
progress . 

If one is thinking of using norm-referenced tests, then test norms need 
to be considered in light of both the specific purpose of assessment and the 
type cf children who are to be assessed. In the words of the 1974 Standards 
for Educ ational and Psychological Tests : 

In norm-referenced interpretations, a test user should 

interpret an obtained score with reference to sets of 
norms appropriate for the individual tested and for the 
intended use . 

(Standard J.S) 

One as take commonly made in this connection is to assume that, because the 
norming sample includes some individuals who are like the individuals or 
group with whom a test is to be used, the test norms are therefore appropriate. 
Even if a norming group contains a ten percent sample of minority children, 
for example, the norms are not therefore necessarily appropriate for use with 
minority children. Instead, one should examine the general characteristics of 
the overall norming sample. As the test Standards put it: 

A test user should examine differences between character- 
istics of a person tested and those of the population on 
whom the test was developed or norms developed. His responsi- 
bility includes deciding whether the differences are so great 
that the test should be used for that person. 

(Standard J.S. 3) 



5* 



9 

ERIC 



-S7. 

TRYING THE TEST otrr 



«»r on. o T m „ llMy tMtM , nt5 ^ wtnti£itd> ^ ^ 
«o t* th„ out , ^ JMpU of chtijrw ^ ^ ^ m ^ 

— . - th.„ to -th «... ^ ^ ^ ^ 

or and ~«- t0 th- in th. My ^ did . ^ 

of p~ctic..-th.t is , pllot tejtlnf tMU ^ oU)eT MjMnm tM ^__ tj 

it ia an to, ^ fcr >dultJ t0 ftrgM ^ ^ ^ pereeive 
* including t „ta „ d other tukjj quica iutna ^ ^ ^ 

A trx-ont * u help „ clMr ^ ^ 
»«- rox chUd™ t0 feno ._ „ ^ cmiin ^ tioM ^ 

int«pr.t.d by „ r rts iM,u n j to ^ cnlldren . 

Tlr1 "' iMtmM,tS OTt -* »•»- W ar. to 0 . ^ ls often 

- in out .*p, rt .nc, it en b. i«n S . ly tUmb1 .. ^ Jf ^ 
tic. i. ftuo^o *th om y on. t«t that has ^ otm ,. lKUi> ^ 

- *. ~r h.lp*i in int.tpr.tin, r. sul „. pocaii. fer i M «n... th. twpl . 
-« in 0,.pt.r , C p. 7, i„ ^ ^ wlth ^ ^ oftra 

«-««- - -i.ph.nt inat„d or a bird „ th. pictur. that go* b.,t „ ltn 

— "AX." b.cau,. th V .a. th. . leph .„ t aa 0^, th. fl yin , , lephln , ^ 
-P.. - «v.ai.d r.c.»t, y i„ . pilot tMt of . tot ^ jcMenin8 instiu _ 
-« in a aooth.r» .cat. ,» on. oo..tion child™ »„. MVed te ^ ^ _ 
addr.„, „d «orin. procdur., ca.U.d & r chiidran to r.c.iv. foil crodic if th V 

~ *" ~ — *™ Ho.es .Hie htd » , tr .. t ^ ' 

H«c.. .coring proctor., w to b. r«i«d to ,a. th.t fact into aocoont. 

P«« t„t o„.,tio„ can b. a ua.foi n..n. o, piio, t..tin, i M tr- CTts , s «. 



9 

ERIC 



50 



-sa- 



Circourel et .1., 1974; Haney and Scott, 1980; and Haney et al., 1981. Each of 
these sources is described in Appendix I, 
USING AND INTERPRETING TESTS 

Given the various uses of tests and , . 

u»«jii t;bst:3 ana assessment devices, it is hard to 

o«.r specific advice on ho 7„ rly childhood teats and i„ trM . nts „. 

»* <~it. ft»a- i «. ipr «. d . „ Jt luthoritativ> souree 

of |«1 .dvic. on this topic is «h. 19 -4 version of standards for Educjtlgnaj 

- "r"" 1 ^" aoiik. „,„„ «, itions , 1974 rttlm ^ t 

special section on standards lor the use of tests. 

Tl.es. standards are relevant to a wide rang, of use. of testing and 
assessment. Th. full docu»nt Standards for Ed„, .tional mi p.,. h „,. f „, 
TiJt. CAPA, AERA „d NOC, 1974) treat the,, in so» detail. Anyone not 
foliar «th the standards *y wish to read the full docu»nt. Here, let us 
si.Pl/ .borate on ao» of the standards especial^ „i, rat t0 early 
hood assessment. 

Regarding selection of a test or other method of assessment, consideration 
should be given to assessment of any given variable or attribute by more than 
one way. This is particularly important in assessing young children, since 
their performance and behavior may be highly variable, and since they often 
lack certain test-taking skills. For example, it often is helpful to use 
formal testing or assessment procedures in conjunction with teacher observe- 
tion or checklists. 

Regarding administration and scoring, one should follow standard pro- 
cedures relevant to the instrument employed along with procedures that enable 
each child to do his or her best. Again, this is especially important at the 
early childhood level. Since formal assessment may be threatening or at least 

60* 



9 

ERIC 



-59- 



unfamiliar to young children, it is vital for the test administrator or the 
observer to establish rapport with children and to make sure that they feel 
comfortable in the assessment situation. 

Regarding interpretation, one point in particular is relevant to the 
early childhood level. Assessment results should be interpreted as an estimate 
of performance under a given set of circumstances; they should not be inter- 
preted as some absolute characteristic of the examinee or as something 
Permanent and generalise to all other circumstances. Violation of this 
principle h*s probably led to more misuse of standardized testing with young 
children than any other. Children's performance may be influenced by behavior 
problem,, visual or hearing defects, language problem,, and ethnic or cultural 
factors. Thus, it is vital to consider the total context of testing or assess- 
ment in interpreting results. In general, one should avoid use of descriptive 
labels that might be misinterpreted. As the Standards Points out: 
The use of a summary label connotes value judgments- unfe-*-,, 

Jubi ctT™ WrdS V d in l«SK't£5£ - 

IfAitll "i 1 * 0 ?""' interpretation. A tJst maker may knoT 

hi Ji! Z ft? h * mng * m he «•» *• ^rm "retarded "tat 
ly I S^^ 0Ver * a interpretation of the lie word 
oy a judge, teacher, parent or child. 

(Standard J. 2. 3) 

To help avoid problems of misinterpretation, such terms as grade- 
equivalent, IQ, or IQ-equivalent should be used with utmost caution, if at all. 
Both IQ scores and grade-equivalents involve severe technical problems. Serious 
misinterpretations often occur, for example, when grade levels are extrapolated 
beyond the range for which the test is designed. Moreover, many test users 
fail to recognize the wide margin of error implicit in IQ or grade-equivalent 
score,. Indeed, because of widespread misinterpretation and misuse of such 
scores, many experts recommend that neither IQ nor grade -equivalent scores be 

61 



APPENDIXES 

AND 
REFERENCES 



62 



9 

ERIC 



•61- 



Appendix l 
NOTES ON SOURCES OF FURTHER INFORMATION 

Since this booklet has provided only a brief introduction to issues in 
•arty childhood assessment, this section provides notes on relevant sources 
of further information. 
GENERAL SOURCES 

One helpful source, of a wide range of information on early childhood 
assessment is Goodwin and Driscoll's Handbook for Measurement and Evaluation 
in Early Childhood Education C1980). This volume provides: a review of 
basic measurement concepts; a discussion of validity, reliability, aid 
usability of measures; a review of observational measurement in early child- 
hood; and separate chapters on (1) intelligence and school-related tests; 

(2) developmental and handicapped screening surveys; language, bilingual, 
and creativity tests; (J) affective measures, and (4) psychomotor measures. 
In addition this handbook provides helpful reviews of (1)' conceptual frame- 
works for evaluation; (2) several recent large-scale evaluations, and 

(3) relevant information from other fields such as sociology and anthropology. 
Anastasi's Psychological Testing (4th edition, 1976) and Cronbach's 

Essentials of Psychological Testing (1970) are toth excellent gen „ al ^ , n 
educational and psychological testing. Anastasi's book includes two brief 
sections devoted to early childhood testing and assessment, one on infant 
and preschool testing (p. 266) and another on intelligence in early child- ' 
hood (p. 332). 

Joh,iaon,s Pr-chool ISS Descriptions (1979) describes 170 preschool 
tests in terms of identifying information, administration, examinee appro- 
priateness, interpretation, technical aspects, and additional comments. 



er|c 63 



-62. 



In terms of purpose, each instrument is described as emphasizing screening, 
diagnosis, or achievement. 

Hoepfner. Stern and Nummedal's CSE-ECRC Preschool/ Kinderg arten t.«* - 
Evaluations (.1971) is another potentially useful source of information 
concerning early childhood instruments. This volume lists several hundred 
•arly childhood instruments Cincluding both full test instruments and sub- 
tests) . The instruments are organized into four broad areas concerning 
the affective domain, the intellectual domain, the psychomotor domain, and 
♦subject area achievement. Each test or subtest is rated via a point and 
letter rating system in terms of measurement validity, examinee appropri- 
ateness, administrative usability, and normed technical excellence. While 
the broad patterns of these ratings provide some useful information, for 
example showing that most instruments are relatively weak in terms of pro- 
viding validity and other technical evidence, considerable caution should 
be exercised in interpreting specific ratings. For example, while Hoepfner 
and his colleagues apply a simple rating system to all tests reviewed, 
different possible applications call for different weight to be given to 
the various attributes of an instrument.* Hoepfner et al.'s CSE Elementary 
School Test Evaluations U976) covers instruments appropriate for grades 
1-6, but the caution suggested with respect to the 1971 volume is relevant 
to this volume also. 

Johnson's Tests and Measurements in Child Development Handbook 

II, (1976) describes nearly 900 unpublished measures of child behavior. 

Measures are classified into the following categories: Qj cognition, 

(2) personality and emotional characteristics. (3) perceptions of environ- 

ments. C4) self -conc ept, CS) qualities of care given and home environment, 

# See Haney et ai., 1978, pp. 110-111 for a discussion o* some of the draw- 
backs in the CSE approach to rating test quality. 



9 

ERIC 



-63- 



(6) motor skills sensoi/ perceptions, C7] physical attributes, 
C8) attitude and interests, O) social behavior, and (10) vocational. 
Listings for each measure include identifying information, description 
of the measure, reliability and validity information, and bibliography. 
An earlier edition of this book, organized along similar lines, was 
Johnson and Bommarito^s Tests and Mmw-M in ~ i ld nevelonment (1971) 
which listed around sbo unpublished instruments. 

- .0. Walker's Socio emotional Measures for Preschool and Kindergarten 
Children (197S) describes 143 instruments designed to measure social and 
emotional measures of young children. Each is described in terms of 
identifying information, general description, norms, validity and relii- 
bility information. 

BuroS ' Mental Measurements Yearbook C*fY S ) are clearly ^ single 
best general source of information on specific tests and instruments . 
Eight *lYs have been published since 1938, but the Sixth MMY 
C1965) , the Seventh WCY (1972) and the Eighth M*Y (1978) are the only 
volumes with information relevant to most currently used tests and instru- 
ments. Buros' Tests in Print I (1961), and II (1974) provide comprehensive 
indexes to previously published Yearbooks . Tests in Print II also includes 
a reprint of the APA, AERA, % NCME Standards for Educational and Psychological 
Tests CAPA, AERA, 5 NCME, 1974). Buros' Yearbooks deal with a wide range 
of tests besides early childhood instruments, but the Seventh MMY , for 
example, describes more than 500 instruments appropriate for children in 
the prekindergarten to grade one age range. The Buros volumes are espe- 
cially helpful in comparison to others because they provide critical re- 
views of most of the tests Listed. 



6 r 



o - 



9 

ERIC 



-64- 

Othtr recommended sources on general issues in early childhood assess- 
ment are Bradley and Caldwell (1974) on issues of testing young children, 
Cazden (1971) and Kamii (1971) both dealing mainly with assessment and 
evaluation at the preschool level, Raizen and Bobrow (1974) concerning 
evaluation of social competence development in Head Start and White et al. 
(1973) concerning a wide range of federal programs for young children and 
research and evaluation of these programs. 

Three sources recommended as examples of what can be learned by pilot- 
testing instruments on a small-scale basis prior to full-scale use are 
Cicourel et al, 1974; Haney and Scott 1980; and Haney et al. 1981. 
Cicourel et al. £1974), particularly Chapter's, describes an analysis of how 
first grade students arrived at answers to a reciting test, on the basis of 
interviews with children after they had taken the test under standard condi- 
tions. Haney and Scott (1980) describe a similar analysis of how second- and 
third-grade children perceived and reasoned about reading, science and 
social studies test questions from four of the most commonly used standardized 
achievement test series. For a more specific example of how a readiness 
instrument was pilot-tested with kindergarten children and revised on 
the basis of pilot-study findings, see Haney et al., (1981), pp. 48-50. 

OBSERVATIONAL APPROACHES TO EARLY CHILDHOOD ASSESSMENT 

A variety of sources of information regarding observational approaches 
to early childhood assessment are available, 

Almy and Genishi's Ways of Studying -Cftiidren -f!979) subtitled *An 
Observational Manual for Early Childhood Teachers/'' provides a good discussion 
of alternative ways of observing children. Specifically discussed are study- 
ing the way children think, asking children about themselves, studying chil- 
dren in gTOups, studying the ways children express themselves, and studying 



66 



-65- 



the child through others. Thou.- the book is aimed primarily at teachers, 
it also would b. of value to anyone interested in observational approaches 
to child study. 

Botha and Weinberg's The Classroom Observer (1977) provides a good 
introduction to systematic classroom observation. This book aims at help- 
ing reader, "derive valid and reliable information about children in their 
natural habitat through the correct and relevant use of observational 
strategies" (p. xi) . After an introduction concerning the selective 
nature of observation, the book discusses Ql defining the problem and 
describing the setting, (2) labeling and categorizing behavior, 
(31 sampling and recording behavior, GH the teacher as observer, (5) the 
relationship between media and observation and ffl applying observation 
skills to education. 

Borich and Madden 's evaluating Cla ssroom Instruction! A Sourcebook 
o* Instruments C1977) reviews almost 170 instruments relevant to evaluation 
of classroom instruction, ' These include rating scales, checklists, observa- 
tional coding systems, and self-report questionnaires. A variety of types of 
instruments are reviewed because the authors seek to "encourage multivariate 
methods of research" (p. 6>. Instruments reviewed a.e organized accord- 
ing to who (teacher, pupil, or observer) provides information about whom 
Cthe teacher, the pupil, or the classrooml Each instrument is described 
in terms of general information and description, illustration of sample 
items and response functions, psychometric characteristics, norms, admin- 
istration and scoring, comments and references. Though the range of instru- 
ments described are not limited to the early childhood levels, several 
sections of the book (especially HA About the Pupil 'rom the Teacher, 
IIC About the Pupil from an Observer, IIIA About the Classroom <rom the . 



-66- 



Teacher and IIIC About the Classroom from an Observer) describe instruments 
relevant to early childhood assessment.- 

Boyer, Simon, and Karatin's Measures of Maturation; An Anthai^v ^ 
Earl; Childhood Observation Instruct. (1973) describes more than 70 obser- 
. vttion systems for use in observing and recording behaviors of infants and 
young children. Each is described in terms of rationale and purpose, 
dimensions* 0* the system, instructions for use, and references and related 
research, 

Carini <s monograph The Art of s-ine- ™a the Visibility of the Person 
C1979) describes -a metaphysics of observing and presents a method for 
gathering and organizing empirical observation in order to disclose 
meaning" (p. 7). Rather than focusing on particular observational tech* 
niques, this monograph aims at describing the art of observation and 
reflection on children through time so as to derive protrayals that -dis- 
close the continuity and transformation ir. [their] thinking as these are 
revealed in their projects and activities, in such , , .mediums of expres- 
sion as drawing, building, and writing." 

fioodwin and Driscoll T Handbook C198irt. described in general above, 
provides a useful introduction^Tobs^rvatiinal approaches to early child- 
hood assessment in Chapter/Four. The rationale behind this chapter is that 
"Carefully cone eptu^IW and applied observational procedures can complement 
other measures available for use in various settings" (p. m). This 
chapter outlines the importance of oBservational measurement in early 
childhood assessment, describes formal and informal approaches to obser- 
vational measurement, recounts the general advantages and limitations of 
observational measurement, and illustrates three hypothetical applications 
of observational measurement. 



-67- 



Stalling,' Uarning to Look , A Handbook on Classroom Observation and 
Teaching Model.. C1977) provides another useful introduction to observa- 
tional assessment in general and to one observational instru.cn A „ partic- 
ular, the SRI classroom observation system, This system was developed in 
the course of the national evaluation of Preset Follow Through, and con- 
sists of thr#e instruments, the physical environment interaction form, 
the classroom checklist, and the five-minute observation form. Stalling* > 
book also describes five different models of early elementary education of 
the sort included in the Follow Through Program C the exploratory, group 
process, developmental, cognitive, programmed; and fundamental 1 
■"■•H) and briefly review, evidence from the Follow Through eva^ ; ion on 
how children grow and develop in each of thos > models. 

On more technical issues regarding observational measurement generally, 
see Garner (I960), Guilford et al. (1962), Wright (1967), Medley and Mitzel 
(1963), Hutt and Hutt (1970), and Borich et al. (1977). 

SP ECIAL ISSUES 

The. reader may also wish to pursue some of the special topics mentioned 
in this booklet through other readings. 

On criterion-referenced measurement, Popham (1978) presents a good 
- introduction. Hambleton and-Eignor 0978), describe a set o* guidelines *or 
possible use in evaluating criterion-referenced tests and test manuals 
Berk (1980) and Hambleton et al. Q978) provide useful reviews of a variety 
of technical issues in criterion-referenced measurement. 

Regarding the use of systematic, measurem-r.t with special groups — 
ethnic minority children, those who do not s^eak English as a first language 
or othc-rwise special individuals— several sources are helpful. Miller (1974), 
Oakland (197*1, and Hilliard Q979) providejoseful reviews of issues in 

AC 



•68- 



the assessment of black and minority children generally. Padilla (.1979) 
provides a similar review with respect to Hispanic Americans. Flaugher 
(19^8) provides a useful review of the many definitions of test bias, and 
Petersen and Novick (1976) give a good review of alternative conceptions of 
fairness in selection testing. Hobbs (197S) presents a broader discussion 
of the use of assessment results in classifying and labelling of children. 
OTHER SOURCES OF INFORMATION' 

All of the sources mentioned above are of somewhat limited value in 
that they are printed material, and as such may become outdated with the 
passage of time. Hence, let us also recommend several institutional sources 
which may be useful in that they provide information on a variety of topics 
on an ongoing basis. 

ERIC. The Educational Resources Information Center (ERIC) network is 
one of the most valuable of such sources of ongoing information. The ERIC 
system encompasses a computerized information retrieval system covering a 
wide variety of educational materials, both published and unpublished. A 
description of the ERIC system is available in NIE's publication ERIC: A 
P-rolife, and suggestions on how to use the ERIC system are provided in Brown, 
Sitts, and Yarborough (1975) and Simmons (1975). The ERIC system is based 
on 16 ERIC clearinghouses which collect, evaluate, and distribute information 
concerning particular topical areas. Three ERIC clearinghouses relevant 
to early childhood assessment, with notes on the scope of areas they cover, 
are: 



ERIC Clearinghouse on the Disadvantaged 
Columbia University, Teachers College 
Box 40 

525 W. 120th Street T/) 
New York, New York 10027 J 
Telephone: (212) 678-3780 



69- 



SZ-^iSSSSflf" 1 e ? eriences and environments, from birth onward; 

SfjSttl iS^'^ " cia P wf0mn " 0f di «dvantaged children 
and youth from grade 3 through college entrance; programs and oracticas 

irAilll^J e fP* rienCM *»W to compensat'e ft^pSal^XSL 
SL^SSf? 1- ; J"""* P"*™"' » d Practices related (1) to economic 
•"ucaSot S^SfZ^S" Se P e « atlon ' ^fegation. and integration n 

:r.;^' a ^r^uS r, " ini curriaum inbalance in thi ' treata6nt 

ERIC Clearinghouse on Wly Childhood Edu cate 

University of Illinois 

College of Education 
80S W. Pennsylvania Avenue 
Urbana, Illinois 61801 
Telephone: (217) 333-1386 

Prenatal factors, parental behavior; the physical, psychological social 
educational and cultural development of children from MrStiSjaJhtfi' 

Bgfeia BBBBBBL Evaluation 
Princeton, New Jersey 08540 
Telephone: (609) 921-9000 ext. 2182 

ISVi «i Q n t nJ r r!! a f irWnent deViCM; evalu * tion Procedures techniques; 
pSg» ' Bea5urement ' or valuation in educational projects of 

More general information on the ERIC system ind its other clearinghouses is 
available from: 

Educationa l Resources Informatio n Center 

ICentral ERIC) 

National Institute of Education 
Washington, D.C. 20208 
Telephone: (202) 2S4-S040 

ETC Head Start Test Collection. The Educational Testing Service also 
administers the Head Start Test Collection which was established to provide 
information about assessment instruments concerning children from birth to 
nine years of age. Qualified persons working in the area of early child- 
hood education may have access to the collection in person or via mail or 



7* 



-70- 



phone inquiries, Tht collection also publishes a series of bibliographies 
on special early childhood assessment topics, which include: 

e Self Concept Measures: An Annotated Bibliography (JED 0S1 305) 

• language Development Test: An Annotated Bibliography (JED 036 082) 

• School Readiness Measures: An Annotated Bibliography (JED 0S6 0831 

• I^ tS - f0r s P anish -Speaking Children! An Annotated Bibliography 
(ED 056 0841 

e Measures o* Social Skills: An Annotated Bibliography (ED C56 0851 

e Assessing the Attitudes o* Toung Children Toward School CA State* 
of»the-Art Paper) CED 056 0861 

e Measure of Infant Development: An Annotated BiMiography 
CJED 058 3261 

For copies of these bibliographies ot further information on the Head Start 
Test Collection, write to: 

Head Start Test Collection 
Educational Testing Service 
Princeton, N,J, 08540 

Title 1 Technical Assistance Centers (TACs) . The TACs serving the ten 
regional areas of the United States are also sources of information on educa- 
tional assessment, particularly with respect to Title I evaluation. 

Re S ion I: Connecticut, Maine, Massachusett, New Hampshire 
Rhode Island, and Vermont 

-RMC Research Corporation 
400 Lafayette Road 
Hampton, N.H. 03842 
Telephone: C603) 926-8888 
436-5385 

Re S ion 11 : New Y °rk, New Jersey, Puerto Rico, and the Virgin 
Islands 

-Educational Testing Service 
Princeton, N.J. 08540 
* Telephone: (609) 734-5117 



9 

ERIC 



72 



-71- 



Re g ion 111 ; Delaware, Maryland, Pennsylvania, Virginia 
West Virginia, and the District of Columbia 

-NTS Research Corporation 
26.34 Chapel Hill Blvd. 
Durham, N.C. 27707 
Telephone: (,919) 493-3451 
C800) 334-0077 

Regi ° n m F } orida > Georgia, Kentucky, Mississippi, 

North Carolina, South Carolina, and Tennessee 

-Educational Testing Service 
Southern Regional Office 
250 Piedmont Avenue 
Suite 2020 

Atlanta, Georgia 30326 
Telephone: (404) 524-4501 

Re S ion V: Illinois, -Indiana, Michigan, Minnesota 
Ohio, and Wisconsin 

-Educational Testing Service 
1 American Plaza 
Evanston, Illinois 60201 
Telephone: (312) 869-7700 

Region VI: Arkansas, Louisiana, New Mexiko 
Oklahoma, and Texas 

-Powell Associates 
3724 Jefferson 
Suite 205 

Austin, Texas 78731 
Telephone: (512) 453-7288 
(800) 531-5239 

Region VII: Iowa, Kansas, Missouri, and Nebraska 

-American Institutes for Research 
P.O. Box 1113 
Palo Alto, CA 94302 
Telephone: C415) 494-0224 

rlffiy 111 ' ^°J aa °? N0Tth ^ oka ' South Dak °*a, Utah. 

ii-Hl* ^iJr C T g CR ;* ion \ VI11 ); Arizona, California, Hawaii , 
Nevada, Guam, Trust Territory of the Pacific Islands, 
and American Samoa (Region IX) ; and Alaska, Idaho 
Oregon, and Washington (Region X). 

-Northwest Regional Laboratory 
710 S.W. Second Avenue 
Portland, Oregon 97204 
Telephone: (503) 248-68S3 

ERJC 



-73- 



Appendix 2 

LISTING OF SELECTED EARLY CHILDHOOD INSTRUMENTS 
AND SOURCES OF REVIEW INFORMATION ON EACH 

This appendix lists over 100 early childhood assessment instruments and 
source, of review information on each one. For each instrument listed, the 
following information is given: 

Title 
Type 

Publisher 
Copyright date(s) 

Grade or age span for which intended 

Sources of review information on the instrument . 

Tl.e titles listed are ones which have been indicated to have been used 
in ECT-I programs in the past, for screening, needs assessment or program 
evaluation. The listing of particular titles does not constitute agency 
endorsement of those instruments, nor does it mean that others should not 
be considered. 

The types used to describe instruments are drawn mainly from tho series 
of test review volumes written by Oscar Buros, the most widely known source 
of review information on tests. However, it should be noted that other 
authorities often use other typologies to describe types of tests. Some of 
the instruments listed by Buros as measuring personality characteristics, 
for example, often are described by others as measuring effective character- 
istics. 

Since the publishers frequently update information on their instruments 
or produce altogether revised versions, at the back of this appendix we have 
listed the addresses of publishers who have issued tests intended to be use- 
ful for assessment at the early childhood level. If one i, seriousiy 



74 



-74- 



considering use of a particular instrument, it is advisable to write to the 
publisher to obtain up-to-date information. 

The sources of review information on the instruments listed refer to 
the following publications. 



T2: 
7*«Y: 
8MMY : 



^g^g^ ggboolc. High- 



CSE-ECRC: Hoepfner R Stem, C, 4 Nummedol, S. (Eds.). CSE-ECRC 

1n — " evaluation . Los Angeles, ( 

luation and Early Childhood 



CSE: 

J*B: 

OJ: 

HJ: 

W: 



wcpmer, k., stern, C., & H\ _ 
preschool/kindergarten te st evaluation W Ange les 

CentfT for the jitudy ox" Evalr— ! 4 - - - - 

Research Center, 1977. 

Hoepfner, R. et al. CSE Elementary school test mUn +iim. 
Los Angeles, CA: Cent er tor the Study of' Evaluation, iVfT 

ditS^;' » Boaoncrito, J. Tests and m easures in chj m 
development. San Francisco, C A: Jossey-Bass, WT. 

£S^«k?T T ! St \ Md meaaurw>en ^ *» ^iid develgj 1: 

Handbook II. aan Francisco, CA: Jossey-Bass, 1976; 

T^o^'i^/" 8611001 tMt desCrl P tio ™- Springfield, Illinois 

gart^'eMM^ 2 ^^ measures for PT MChool and kinder- 
garten children, san Francisco; CAT^ossey bar, lj 4 !, 



Numbers given for T2, 7f«Y and refer to test entry numbers. Those 
for J+B, OJ, HJ, and W are page references. No page references are provided 
for CSE and CSE-ECRE because in these volumes, information on particular 
instruments typically is spread across a fair number of pages. 



75 



9 

ERIC 



-75- 



EARLY CHILDHOOD ASSESSMENT INSTRUMENTS 

ABC^InvOTtory to Determine Kindergarten and S c hool Ra adin.« 
Resjarch Concepts. [Educational Studies 5 Development] 

Entrants to kindergarten or grade 1 

T2: 1691 

7MMY: 739 

CSE-ECRC: K6 

J*B: 27-28 

HJ: 2S-26 

American School Ach ievement Tes ts 
Achievement 

Bobbs Merrill Co., Inc. 
1941-75 

Grades 1, 2-3, 4-6, 7-9 

8MMY: 4 

CSE: 

HJ: 31-32 

Houghton Mifflin Co. 
1969-72 

Kindergarten - 1 
8MMY: 796 
CSE: 

Animal Crackers: A Test of Motivation to a« m«v» 

Personality — ■ 

CTB /McGraw-Hill 
1973-75 

Preschool - grade 1 

8MMY: 497 

CSE: 

Basic School Skills Inventory 
Miscellaneous: Learning Disabilities 
Follett Publishing Co. 
1975 

Ages 4-6 
8MMY: 424 
CSE: 

HJ: 45-46 

Bayley Sca les of Infant Develotan ent 

intelligence - Individual 

Psychological Corporation 
1969 

Ages 2-30 months 
8MMY: 206 

7MMY: 402 76 
HJ: 47-48 



-76- 



Btndtr-Cestalt Test 
Personality 

African Orthopsychiatry Association, Inc. 
1938-48 

Ages 4 and over \ 
8MKY: 506 
71#iy: 161 
HJ: 51 -52 

Bilingual Syntax Measure 
foreign Language - Spanish 
Psychclccical Corporation 
1973-76 

Bilingual children, kindergarten - grade 2 
8MMY • 1 56 



Boeha Test of Basic Concepts 
Intelligence - Group 
Psychological Corporation 
1967-71 

Kindergarten - grade 2 

8MMY: 178 
7MMY: 335 
CSE-ECRC: 
CSE: 36 
HJ: 55-56 



ERIC 



Sotel Reading Inventory 
Reading - Miscellaneous 
Follett Educational Corporation 
1961-70 

Grades 1-4, 1-6, 1-12 
T2: 1658 
7MMY: 727 
CSE: 

California Achievement Tests 

Achievement 

CTB/McGraw Hill 

1934-74 

Grades 1.5-2, 2-4, 4-6, 6-9, 9-12 
JMMY: 10 
7MMY: 5 
CSE: 

Californir. Preschool Social Competency Scale 

Personality 

Consulting Psychologists Press, Inc. 
1969 

Ages 2.5 - 5.5 

8MMY: 513 77 

7MMY: 48 

W: 261-262 

CSE-ECRC: 

HJ: 67-68 



-77- 



California Short Form Tes t of Mental Maturi ty 

intelligence - Group 

CTB/McGraw-Hill 
1938-65 

Und.rgart.„ - 1.4, 1.5-3.4, 3-4, 4-6, 6-7, 7-8, 9-12, 12-16, adult5 



7m: 337 
CSE-ECRC: 
CSE: 32 



Children's Embedded Figures T est 
Personality 

Consulting Psychologists Press, Inc. 

1963-71 

Ages S-12 

8»#(Y: 519 

7MMY: S3 

CSE: 

Children's Self-Social Constructs Tea t 
Interests or Preferences ~ 
Virginia Research Associates 
1967 

Preschool 
W: 141-142 
HJ: 81-82 



Circus 



ReaHIness, language, and motor development 
Addison-Wesley Publishing Company, Inc. 

Preprimary to K.5 (Circus A), K.S - 1.5 (Circus B), Pimrary to l.S-2.5 



Cognitive Abilities Test 
intelligence - Group 
Houghton-Mifflin 
1954-74 

Kinder gart an - 1 , 2-3, 3-12 

8MMY : 181 

7MMY: 343 

CSE-ECRC: K17 

CSE: 

Cognitive Skills Assessment Battery 
Reading Readiness — — <- 
Teachers College Press 

1974 

Prekinderga ten 
8MMY: 797 



73- 



.78- 



Columbia Mental Maturity Scale 
intelligence - Individual 
Psychological Corporation 
1954-72 

Ages 3-6 to 9-11 
8MMY: 210 
CSE-ECRC 
CSE: 

HJ: 87-88 

Comprehensive Identification Process 
Miscellaneous - Learning Disabilities 
Scholastic Testing Service, Inc. 
197S 

Ages 2.S - S.S 
8MMY: 425 
HJ: 91-92 

Comprehensive Test of Basic Skills 
Achievement 

CTB /McGraw-Hill 
1968-76 

^I'H'S.i* Vi ^ t J 8arten - 6 " 1 ' 9 ' l -« * 2 - 9 » 2.5 - 4.9. 4.5 - 6.9, 
8MMY: 12 
7MMY : 9 
CSE: 

Cooperative Preschool Inventory 
intelliyenca . indiv-I Afl 

Cooperative Tests 4 Services 

1965-70 

Disadvantaged chjidrun ages 3-6 
T2: 490 j 
7MMYi 404 
HJ: 95 



Cooperative Primary Te3ts 
Achievement 

ETS; Addison- Wesley Publishing Company, Inc. 
1965*67 



Grades 1.5 - 2.5, 2.5 - 3 
8MMY: 13 
7MMY: 10 
CSE: 



79 



9 

ERIC 



-79- 



D«f*v»r Developmental Sc rsening Test 
Intelligence - m<liv4« fr i a l * 

l?68^70 l>XOj * Ct * PubU,hin « p O«nd*ti^n 

Ages '2 weeks - 6 years 

T2: 492 

7MMY: 40S 

J*B: 32-33 

HJ: 99-100 

Detroit Test s of Learning Ap titude 

intelligence - Individual 

Bobbs Merrill Co., Inc. 
193S-7S 

Ages 3 and over 
8MMY: 213 
7MMY: 406 . 
CSE-ECRC 

CSE: 

jWowental Test of Visual Perceptio n 
Vision c 

Consulting Psychologists 
1966 

Ages 3-3 

8MMY: 882 
KJ: 115-116 



0«velop«ental Tests of Vis ual-Motor Integrat e 

sensory -Motor * — ~ 

Follett 
1967 

Ages 2-8, 2-15 
8MMY: 870 

7MMY: 867 t 

CSE-ECRC: 

CSE: 

HJ: 113-114 



( ( 1963-75 



Diagnostic Reading Scale 
Reading - Diagnosis 
CTB/McGraw Hill 



8MMY- S 7S3 P °° r readers ^ ****** 

7MMY: 717 

CSE 



Draw-A- Person 
character - Proj ective 
Western Psychological Services 
1963 

Ages 5 and over 
T2: 14S5 
7MMY: 165 

° SO 
ERIC ou 



-80- 



DurreU Analysis of Reading Difficulty 
Reading - Diagnostic 
Harcourt Brace Jovanovich, Inc. 
" 1933-53 
Grades 1-6 
T2: 1628 1 
CSE: 

DurrtU Listening-Reading Series 
Reading -Miscellaneous 
Harcourt Brace Jovanovich, Inc. 
1969-70 

Grades 1-2, 3-6, 7-9 
T2: 1660 
7MMY: 728 
CSE: 

Purr ell -Sullivan Reading Capacity and Achievea ent Test 

Reading - Miscellaneous 

Harcourt Brace Jovanovich, Inc. 
1937-45 

Grades 2-S - 4.5, 3-6 
T2: 1661 

Gates-MacCinitie heading Tests 
Reading 

Houghton Mifflin Co. 
1926-72 

Grades 1, 2, 3, 2.5-3, 4-6, 7-9 

8t#ff: 726A 

7MMY: 689 

CSE-ECRC: 

CSE: 

Goodenough-Harris Drawing Test 
intelligence - Group 
Psychological Corporation 
1926-63 
Ages 3-15 
8MMY: 187 
7MMY: 352 
CSE-ECRC: 
CSE: 
HJ: 

Gray Oral Reading Test 
Reading - Oral 
Boobs-Merrill Co., Inc. 
1963-67 

Grades 1-16 and adults 
2: 1681 
CSE: 



81 



.81. 



Illinois Test of Psvch oioiinguistic Abiliti es 

Miscellaneous - Learning Disabilities " 

University of Illinois Press 

1961-68 

Agts 2-10 

8M4Y: 431 

7MMY: 442 

CSE-ECRC: 

CSE: 

HJ: 138-139 

Individualized Criterio n-Referenced Tes ting 

needing - Diagnosis ' — -* 

Educational Development Corporation 

1, J. 3, 4, 5. ». 7. , 

£tteaiati cs 2ed Criterion " Ref<renced T««ting 

Educational Development Corporation 
1973-77 ^ 
Grades I, 2, 3, 4, 5, 6, 7, 8 
8MMY: 275 

Iowa Tests of Basic Skill* 
Achievement ' 
Hoghton Mifflin Co. 
195S-73 

Grades 1.7-2.5, 2.6-3.S, 3-9 
8MMY: 19 
7MMY: 481 

Key Math Diagnostic Arithmetic Test 

Matnematics - Arithmetic 

American Guidance Service 
1971-76 

Preschool - Grade 6 

8MMY: 30S 

CSE: 

HJ: 146-147 

Kindergar ten Auditory Screening Te st 

Fouett Publishing Co. ' 

1971 

Kindergarten - Grade 1 

8MMY : 940 

CSE: 

HJ: 148-149 



82 



-82- 



' faS£BH£ £££$l£ft R **din*»» Test 
Reeding - Readiness 
CTB/McC?av-Hill 
1931-62 

Kindergarten - Grade 1 

T2: 1563 

7m i 7S2 

CSE-ECRC 

CSE: 

McCarthy Scales of Children's Abilitie s 
• Intel 'igence - In ii vidua! " 
Psychological Corporation 
1270^72 

Ages 2.5 - 8.5 
Uttt: 219 
CSE': 

ttf: 172-173 

Meeting Street School Screening Test 
' Miscellaneous - Learning Disabilities 
Crippled Children' 4 Adults of Rhode Island. Inc. 

im 

Kindergarten - Grade 1 
.»•#: 43S 

rmci 7S6 

'CSE: 

Ittr 174.175. 

Metropolitan Achievement Test 
AchieyeBent 

Psychological Corpora eion 
1031-73 . 

t.Underge-rten. 7-1.4, 1..-2.4, 2.5-3.4, ' 3. S-4. 9, 5.C-6.9, 
^fc^Y • * 
7MMY: 14 
CSE: 

Monroe Reading Aptituae Tests 
> Reading readiness 
htoughton Mifflin 
T1935-63 
Kindergarten - r de 1 
~ 1724 
:-ECRC: 



'. 0-9.1 



ihy-lh reell . ' ading Readiness Analysis 
jading Readines s 
sychulogicai Corporation 
947-65 , 

irst grade entrants - 

803 8 3- 

7S8 
>KAC: 



f 



-83- 



Otis-Lennon^e'ital Ability Test 
intelligence - Group 
Psychological Corp ration 
1936-70 

$OT? rg 198 en * I '°" l - S ' 1 - 6 " 3 - S ' 4 " 6 ' 7 ' 9 > 10-12 

7mt:. 370 

CSE-ECRC: 

CSE: 

Peabody Individual Achievement Test 
Achievement ! 1 • 

American Guidance Service 
1970 

Kindergarten - 12 
mtf: 24 
7MMY: 17 
CSE-ECRC. 
CSE: 

Peabody Picture Vocabulary Test 

intelligence " Individual 

American Guidance Servicr 
1959-6S 
Ages 2.5 - 18 
Stfft: 222 
7MMY: 417 
CSE-ECRC 
CSE: 
. HJ: 191-192 

Pictorial f «»t of Intelligence 
intelligence - individual 
Houghton Mifflin Co.- 
1964 

Ages 3-8 

8^: 223 
7MMY: 418 
CSE-ECRC: 
CSE: 

Preschool Embedded Figures Test 
Personality - Nonpro j ective 
Consulting Psychologists Press, Inc. 

Ages 3-5 i 
T2: 1331 



Preschool Interpretation Problem-Solvin g Test 
Personality 8 — : — 

Myrna Shire and George Spivack 
NA 

Age 4-5 oi 
OJ: 565-567 °3 



ERIC 



-84- 



Prescriptiye Reading Inventory 
Reading - Diagnosis 
CTB/McGraw Hill 
1972-77 

Kindergarten. 0-1.0, Kindergarten. 5-2.0, 1.5-2.5, 2.0-3.5, 3.0-4.5 4.0-6. 
8flrfY : ,'769 ' 

Primary Academic Sentiment Scale 
Reading - Readiness 
Priority Innovations, Inc. 
4968 

Ages 4-4 to 7-3 

T2: 1723 
7MMY: 760 
W: 147, 212 
CSE-ECRC 

Priaary Mental Abilities Test 

Multi-aptitude 

Science Research Associates 

1946-6S 

Kindergarten-1 , 2-4, 4-6, 6-9, 9-12 

8MMY : 488 

T2: 1087 

CSE-ECRC: 

CSEi 

HJ: 213-214 



SRA Achievement Series 
Achievement 

Science Research Associates 
19S4-69 

Grades 1-2, 2-4, 3-4, 4-9 
7MMY: 18 

T2: 731, 108, 1S96, 1790, 1947, 176S 

SRA Assessment Survey 
Achievement 

Science Research Associates 
1954-7S 

Grades 1-2, 2-4, 4-5, 6-7, 8-9 * 

8MMY: 1 

CSfi: 

Santa Clara Inventory of Developmental Tasks 
Readiness 

Richard L. Zweig Associates, Inc. 
1974 

Ages Preschool, S-S.S, 6-6.5, 7 
No references 



Bo 



-85- 



School Readiness T <t 
Heading - Readiness 
Scholastic Testing Service, Inc. 
1974-77 

Kindergarten - Grade 1 
8MMY: 808-9 

Screening Test for Audi tory Comprehension of Language 

Miscellaneous - listening 8 — *- 

Learning Concepts 
1973 

Ages 3-6 

smt: :-a 
OJ: 223 
CSE: 41 
HJ: 135-236 

Screening Test of Academic Readiness 
Reading - Readiness 
Priority Innovations. Inc. 
1966 

Ages 4-0 to 6-5 

T2: 1730 
7MMY: 765 
CSE-ECRC: 
HJ: 239-240 

Hie Self -Concept and Motivation Inventory: What Face Would You Wear 

Personality 1 

Person-O-Metrics, Inc. 
1967-77 

Age 4-kindergarten, Grades 1-3, 3-6. 7-12 

8MMY: 670 

OJ: 722-23 

W: 249 

CSE: 

Short Form Test of \cademic Aptitude 
Intelligence - Group 
CTB/ McGraw-Hill 
1936-74 

Grades 1.5-3.4, 3.5-4, 5-6, 7-9, 9-12 
8MMY : 202 
7MMY: 387 
CSE: 

Short Test of Educational Ability 

Intelligence - Group 

Science Research Associates, Inc. 

1966-70 

Kindergarten-1, 2-3, 4-6, 7-8, 9-12 

7MMY: 382 

CSE-ECRC: 

CSE: 

86 



-86- 



Slosson Intelligence Test 

Intelligence - Individual 

Sloison Educational Publications, Inc. 

1961-63 

Ages 2 weeks and over 

8MMY: 227 

7MMY: 424 

CSE-ECRC: 

CSE: 

HJ: 243-244 

Slosson Oral Reading Test 
Reading - Oral 

Slosson Educational Publications, Inc. 
1963 

Grades 108, and high school 

T2: 1688 

CSE: 

Stanford Achievement Test 
Achievement 

Psychological Corporation 
192307S 

Grades 1.5-2.4, 2.S-3.4, 3.S-4.4, 4.S-S.4, S.S-6.9, 7.0-9.S 

7MHY: 2S 

CSE: 

HJ: 

Stanford- Binet Intelligence Scales 
intelligence - Individual 
Houghton Mifflin Co. 
1916-73 

Ages 2 and over 
Slftt: 229 
7MMY: 425 
CSE-ECRC: 

Stanford Diagnostic Mathematics Te st 
Mathematics — 
Psychological Corporation 
1976 

Grades 1.S-4.S, 3.5-6. S, 5. 5-8. 5, 7.5-13 
8M*Y: 292 

Stanford Diagnostic Reading Test 
Reading - Diagnosis 
Psychological Corporation 

1966-76 ^ 
Grades 1.5-3.5, 2.S-S.S, 4.5-9. 5, 9-13 
8MMY: 777 
7MMY: 725 



9 

ERIC 



-87- 



Stanford Early School Achievemen t Test 
Achievement " 

Psychological Corporation 
1969-71 

Kindergarten -l.l, l.i-l 8 

8WY: 30 

7mf: 28 

CSE-ECRC: 

CSE: 

HJ: 249-250 

Steinbach Test of Reading Readines s 

Reading - Readiness ~ 

Scholastic Testing Service Inc. 
1965-66 

Kindergarten - Grade 1 

T2: 1732 

CSE-ECRC: 

Teaplin-Qarley Tests of Articulation 
Speech & Hearing - Speech ' 

I960*69° f Educational Resea "h and Service 

Ages 3 and over 
T2: 2095 
7Hf(x 972 
CSE-ECRC: 
HJ: 253-254 

Test of Language Development 
Speech 8 Hearing - Speech 
Empiric Press 
1977 

Ages 4-0 to 8-11 
8M4Y: 978 

Test of No nverbal Auditory Discrimination 

Speech $ Hearing - Hearing " 

Follett Publishing Co 
1968-75 

Kindergarten - 3 
8MW: 950 

0J: 947 



Tests of Basic Experiences 
Achievement 
CTB/McGraw Hill 
1970- 3 

SMOT^S**"" 1 " Kinder 8 arten < Level Kindergarten - Grade 1, CLevel L) 

7NWY: 33 

CSE-ECRC: 

CSE: 8S - 

HJ: 2S7-2S8 

ERIC , 



-88- 



Volttt Developmental Surv ey of Basic Learning Ab ilities 

Reading - Readiness ~~ 8 

Consulting Psychologists Press, Inc. 
1966 

Ages 2-7 

T2: 991 
7hK(: 767 
CSE-ECRC: 
CSE: 

HJ: 267-268 



Vineland Social Maturity Scale 
Personality 

American Guidance Service 
1935-6S 

Birth to maturity 

8*tff: 703 

W: 301-302 

CSE-ECRC: 

CSE: 

HJ: 273-274 



Readiness^" 1 * 85 **** f ° r Disadvantaged Preschool Children 
Wanda H. Walker 
Age 4 to 6 years 
0J: 1S4-15S 

Wechsler I ntelligence Scale for Children 

intelligence - Individual 

Psychological . Corporation 

1949-74 

Ages 5-16 

atflf: 232 

frfflf: 431 

CSE-ECRC: 

CSE: 

Wechsler Pr eschool and Primary School Intellige nce Test 

Intelligence - Individual 

Psychological Corporation 

1949-67 

Ages 4-5.5 

8M*Y: 234 

7M*Y: 434 

CSE-ECRC: 

CSE: 



-89- 



Wide Range Achievement Test 
Achieveaent 

Guidance Associates of Delaware, Inc. 
1940-76 

Ages 5-11, 12 and over 

»#4Y: 37 

7M4Y: 36 

CSE-ECRC : 

CSE: 

HJ: 279-280 

Woodcock Reading Mastery Test 
Reading - Diagnosis 
Aaerican Guidance Service 
1972-73 

Kindergarten - 12 
8M4Y: 779 
CSE: 



ERIC 



90 



-90- 



EARLY CHILDHOOD TEST INSTRUMENT PUBLISHERS 

Addison-Wesley Publishing Co., Inc. 

2725 Sand Hill Road 

Menlo Park, California 94025 

American Guidance Service, Inc. 

Publishers' Building 

Circle Pines, Minnesota 55014 

Aaerican Orthopsychiatric Association, Inc. 
1775 Broadway, New York, New Yorh 10019 

Bobbs-Merrill Co., Inc. (The) 
4300 West 62nd Street 
Indianapolis, Indiana 46268 

Bureau of Educational Research and Service 
University of Iowa 
fowa City, Iowa 52242 

JTB/McGraw Hill 

bel Monte Research Park 

Monterey, California 93940 

Consulting Psychologists Press, Inc. 

577 College Avenue 

Palo Alto, California 94306 

Cooperative Tests and Services 

c/o Addison-Wesley Publishing Co., Inc. 

2725 Sand Hill Road 

Menlo Park, California 94025 

Cri r pled Children and Adults of Rhode Island, Inc. 

Meeting Street School 

667 Waterman Avenue 

East Providenca, Rhode Island 02914 

Educational Development Corporation 
P.O. Box 45663 
Tulsa, Oklahoma 7414S 

Educational Testing Service 
Princeton, N.J. 08540 

Empiric Press 91 
333 Perry Brooks B ±lding 
Austin, Texas 7870* 



9 

ERIC 



Follett Publishing Co. 

1010 West Washington Boulevard 

Chicago, Illinois 60607 

Guidance Associates of Delaware, Inc. 
1526 Gilpin Avenue 
Wilmington, Delaware 19806 

Harcourt Brace Jovtmovich, Inc. 

757 Third Avenue 

New York, New York 10017 

Houghton Mifflin Company 

1 Beacon Street 

Boston, Massachusetts 02107 

Learning Concepts 
2501 North Lamar 
Austin, Texas 78705 

Person-O-Metrics, Inc. 
20504 Williamsburg Road 
Dearborn Heights, Michigan 48127 

Priority Innovations, Inc. 

P.O. Box 792 

Skokie, Illinois 60076 

Psychological Corporation (The) 

757 Third Avenue 

New York, New York 10017 

» 

Research Concepts 
1368 East Airport Road 
Muskegon, Michigan 49444 

Richard Zweig Associates, Inc. 

20800 Beach Boulevard 

Hunnington Beach, California 92648 

Scholastic Testing Service, Inc. 
480 Meyer Road 
Bensenville, Illinois 60106 

Science Research Associates, Inc. 
155 North Wacker Drive 
Chicago, Illinois 60606 



-92- 



M. Shuft and G. Spivack 
Community Mental Health 
Mental Retardation Center 
Department of Mental Health Sciences 
Hahneman Medial College and Hospital 
Philadelphia, Pennsylvania 19102 

Slosson Educational Publications, lac. 

140 Pine Street 

Bast Aurora, New York 14052 

Teacher's College Press 
1234 Amsterdam Avenue 
New York, New York 10027 

University of Illinois Press 
Urbana, Illinois 61801 

Virginia Research Associates, Ltd. 
P.O. Box S501 

Charlottesville, Virgin' % 22902 
Wanda Walker 

Northwest Missouri State College 
Morgsville, Missouri 64468 

Wesvsrn Psychological Services 

12031 Wilshire Boulevard 

Los Angeles, California 90025 



93 



-93- 



Appendix 3 

ANNOTATIONS ON EARLY CHILDHOOD INSTRUMENTS 

This appendi* provides brief annotations concerning five additional 
instruments : 

Animal Crackers 
CTBS Readiness Test 

Santa Clara Inventory of Developmental Tasks 
Preschool Inventory 

Wechsler Preschool and Primary Scale of Intelligence 
These annotations are provided simply to illustrate the type of informa- 
tion useful in initially screening instruments for possible use. The fact 
that particular instruments are listed here should not be interpreted as an 
endorsement . 



94 



-94- 



ERIC 



Annotation Form 

TITLE: Animal Crackers: A Test of Motivation to Achieve FORMS : 

,,_,„ _ . „ COPYRIGHT : 1973 

AUTHOR: Dorothy C. Adkins 8 Bonnie L. Ballif 

PUBLISHER: CTB/McGraw-Hi 1 1 

SOURCE: CTB/McGraw-Hi 1 1 , Del Monte Research Park 
Monterey, CA 93940 

PRICE AS OF 1980: $18.60 



I. DESCRIPTIVE INFORMATION 

TYPE OF TEST: personality test 

INTENDED USE: to assess achievement motivation, how the child feels 
about himself in the school situation and whether or not learning is 
important to him 

INTENDED POPULATION: preschool, kindergarten, and first grade 

ITEM FORMAT: objective-projective technique (the child chooses between 
alternative behaviors or attitudes, described orally). Each item 
consists of an illustration of two identical animals and two oral 
descriptions. The child is told that he has his "own** animals which 
look like the others but behave as he does. As the examiner points 
to each animal in turn and describes it, the child identifies hie 
own animal. 

i 

CONTENT: School enjoyment 
Self-confidence 
Purposiveness 
Instrumental activity 
Self -evaluation I 



II. REFERENCES j 

1. Adkins, Dorothy C. 8 B.L. Ballif. Examiner's Manual, Research Edition : 
Animal Crackers, A Test of Motivation to Ac hieve! Monterey, 
*t - 1973. 



CA: GTS/McGraw-Hill 

2. Weintraub, S. Review in 
pp. 693-694. 



Buros* Eighth Mental Measurements Yearbook, 



95 



Annotation Form " 9S " 

TITLE: CTBS Readiness Test FO rms : Level A, Form S 

.,__„,_ ^ m COPYRIGHT: 1977 

AUTHOR: CTB/McGvaw-Hill 

PUBLISHER: CTB /McGraw-Hill 

SOURCE: CTB/MiGraw-Hill, Del Monte Research Park, Monterey, CA 93940 

PRICE AS OF 1980: $5.90, specimen set 

/ 

/• DESCRIPTIVE INFORMATION 

TYPE OF TEST: readiness test 



\ 



INTENDED USE: "to help kindergwrteif and first grade teachers and 
supervisors determine if their students have the skills necessary 
for beginning, reading"; to diagnose strengths and needs in par- 
ticular skijll/ areas; to predict success in reading 

INTENDED POPULATION : Grades K.O - 1.3 

ITEM FORMAT: multiple choice (children fill in circle corresponding to 
correct choice) 

CONTENT: letter forms 
letter names 

listening for information > 

lette* ,ounds / 

visual discrimination 

sound atching 

language 

mathematics 



II. REFERENCES 

CTBS Readiness Test: User's Handbook for the Readin g Readiness Report 
of Skill Mastery. Montsrey, CA: iTB/McGraw-Hill, 1977. 

CTBS Readiness TEST: Examiner's Manu al. Monterey, CA: CTB/McGraw- 
Hill, 1977. 1 — 

CTBS Readiness T«st: Test Book . Monterey, CA: CTB/McGraw-Hill , 1977. 

Find j!T'. rW * Review of Comprehensive Test of Basic Skills, Expanded 
Edition, Buros' Eighth Mental Measurements Yearbook , pp. 40-43. 

Nitko, A. Review of Comprehensive Test of Basic Skills, Expanded 
Edition, Buros' Eighth Mental Measurements Yearbook , pp. 43-45. 



96 



